<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Mining Applications in Healthcare: Research vs Practice</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olegas NIAKŠU</string-name>
          <email>Olegas.Niaksu@mii.vu.lt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga KURASOVA</string-name>
          <email>Olga.Kurasova@mii.vu.lt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vilnius University, Institute of Mathematics and Informatics</institution>
          ,
          <addr-line>Akademijos str. 4, LT-08663, Vilnius</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
      </contrib-group>
      <fpage>58</fpage>
      <lpage>70</lpage>
      <abstract>
        <p>The paper interrogates the commonly accepted belief that data mining is widely used in medicine [8] by comparing academic advances with practical achievements in the field. The paper discusses practical usage and potential gains of data mining in healthcare facilities along with the growing number of publications indicating increasing interest to the topic in the scientific society. In order to evaluate the practical use of data mining in healthcare, a survey of tertiary hospitals in 5 countries has been conducted. The countries from diverse economic development regions were selected to cover 7 tertiary hospitals with unlike economic potential. Quantitative analysis of publications in the area of data mining applications in healthcare was made in the period of the last 8 years.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Data mining applications</kwd>
        <kwd>medical information systems</kwd>
        <kwd>medical informatics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The healthcare domain is known for its ontological complexity and variety of medical
data standards and variable data quality [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ]. Adding to this privacy consideration,
making an effective and practically usable medical knowledge discovery is an open
subject for the last decades. Modern clinical practices also undertake transformation
not only in diagnosis, and treatment methods, but also in understanding of health and
illness concepts [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        Though data mining (DM) methods and tools have been applied in different
domains already for more than 40 years, their applications in healthcare are relatively
young. R. D. Wilson et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] have started to classify and collect medical publications
where knowledge discovery and DM techniques were applied or researched from 1966
till 2002. According to their study results “…some authors refer to DM as the process
of acquiring information, whereas others refer to DM as utilization of statistical
techniques within the knowledge discovery process.” In fact, this mix of different
concepts makes research complicated and less reliable. Therefore we decided to
complement typical detailed analyses of scientific and commercial publications with
surveying of large healthcare facilities, which conduct scientific and commercial
research studies.
      </p>
      <p>Aiming to avoid misinterpretations, the concept of data mining was defined and
explained to the survey participants as follows: “Data mining, as part of knowledge
discovery process, is a set of data analysis methods using statistical methods and
heuristics, which are used for prediction, classification, clustering tasks or finding
hidden patterns and correlations in raw data”. Following examples of typical DM usage
in healthcare were provided: “patient diagnostics, prediction of patient condition,
prediction of post-operational complications…”.</p>
      <p>Gathering the information from hospitals allows us to put academic effort and
practical usage side by side and conclude on actual DM usage, and to understand if
there is a gap between data analysis experts’ community and healthcare practitioners
and scientists.</p>
      <p>In this paper, we intend to combine the quantified results of publication search,
which contained details of DM applications in healthcare with the results of tertiary1
hospitals’ survey on the practical DM usage. The outcome of the combination of these
different sources should help us formulate a hypothesis for a further more specific and
larger scale survey on the magnitude of actual DM applications in the healthcare.</p>
      <p>
        Starting from the 21st century many countries have chosen e-Health as a priority
national program, which in essence proposes to benefit from the standardization,
aggregation of patient’s clinical information and healthcare services rendered by
providing instant access to that information to healthcare professionals as well as
patients themselves [
        <xref ref-type="bibr" rid="ref12 ref19 ref6">6, 12, 19</xref>
        ]. According to the report [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] from the National Center
for Health Statistics of USA, adoption of Electronic Health Record (EHR) in the USA
as the most prominent medical information system is shown in Figure 1. It illustrates a
linearly raising amount of non-sparse, but continual data reflecting patients’ clinical
continuity together with the treatment which took place and medication being used.
      </p>
      <p>
        According to strategic plans of the EU member states, the USA and of many other
nations from all continents, a considerable amount of investments is allocated to enable
the global computerization of healthcare data. Taking a linear progression would mean
that in 10 years all new medical encounters will be thoroughly digitalized in all
developed countries. The exponential growth is doubtful mostly because of lack of
governance structures, data protection and patient privacy issues and resistance from
inside of medical community [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. But even considering a conservative scenario, it is
1 Tertiary hospital – a major hospital, providing wide range of high level specialized medical services.
Commonly tertiary hospitals are university hospitals combining medical and academic activities.
becoming obvious that, for the first time in the history, research community is going to
get a full set of a person’s medical history from the birthdate till he or she passes away.
And that is not for a small specific group limited by longitudinal research study, but
the whole regions, nations, countries and even continents. This anticipated scenario
forecasts tremendous potential for machine learning and in particular for DM
applications in healthcare.
      </p>
    </sec>
    <sec id="sec-2">
      <title>1. Scope of Analysis</title>
      <p>
        Thomson Reuters Web of Science [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], Google Scholar [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] and PubMed [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]
databases were used to analyze the number and distribution of scientific publications
related to DM in medicine in the last decade.
      </p>
      <p>Tertiary hospitals were selected as a primary source for our survey. The main
reason is that typically, tertiary hospitals are in the first line of healthcare institutions
that implement clinical software systems, enabling to collect clinical and
demographical patient data needed for DM applications. Historically hospital
information systems developed starting from the exotic show cases of the economically
well-established communities to the standardized practice of handling clinical data and
workflows since mid-nineties in developed countries and from the first decade of the
XXI century in the developing countries and emerging markets.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Scientific Relevance and Development</title>
      <sec id="sec-3-1">
        <title>2.1. PubMed Database</title>
        <p>
          PubMed database is comprised of more than 21 million citations for biomedical
literature from MEDLINE, life science journals, and online books. PubMed is operated
by National Healthcare Library of U.S. and indexes all publications classifying its
content with the help of MESH structured vocabulary [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. Using MESH vocabulary
terms as a search parameter in PubMed database guaranties that not only search
wording matching publications will be found, but also its matching synonymic wording
or previously used terms. MESH term, classified as MESH heading "data mining" is
mapped to other similar concepts like "text mining". "data mining" term was appended
to the vocabulary only in 2010 and the former terms e.g. "Information Storage and
Retrieval", previously used for the same or similar and related concepts, are mapped to
the latest one. A simple search criterion "data mining" was used to retrieve a number of
publications and books within the medical domain with assigned MESH heading "data
mining". The first publication is dated 1984, however the second one appears only after
10-year interval in 1994. This search resulted in 3077 publications.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Thomson Reuters Web of Science Database</title>
        <p>the search scope (title of the publication or whole text). The following constraints have
been chosen for our analysis purposes:
(TS=(data mining) AND TS=(medic* OR clinical OR healthcare)) AND
Document Types=(Article OR Abstract of Published Item OR Proceedings Paper)
Refined by: [excluding] Web of Science Categories=( OPERATIONS
RESEARCH MANAGEMENT SCIENCE OR TELECOMMUNICATIONS )
Timespan=1996-2012. Databases=SCI-EXPANDED, SSCI, A&amp;HCI, CPCI-S,
CPCI-SSH. Lemmatization=On</p>
        <sec id="sec-3-2-1">
          <title>This search resulted in 2272 publications.</title>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. Google Scholar</title>
        <p>Google Scholar provides a scholarly literature search service across many disciplines
and sources, including theses, books, abstracts and articles. However it is not limited to
scientific publications only. Google Scholar indexes content items published since
1993. Google search filter allows the use of logical operations AND, OR, NOT, a
restricted search only in the selected subject areas, search scope (title of the publication
or whole text). The following constraints have been chosen for our analysis purpose:
Search in the title: "data mining" AND (medical OR clinical OR medicine OR
healthcare)</p>
        <sec id="sec-3-3-1">
          <title>This search resulted in 478 publications.</title>
          <p>The choice of searching in the whole article text was rejected due to a serious flaw:
a huge amount of DM centric publications have keywords “medicine” or “healthcare”
in the text with a purpose to illustrate DM usage. But this proved to be insufficient to
indicate that a publication is focused on DM applications in medicine.</p>
          <p>Google indicated the number of publications in the selected period of time
approximately. And according to our observations, precision is increasing with a larger
quantity of the relevant content items found.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>2.4. Results</title>
        <p>Distribution of publications found in Web of Knowledge, Google Scholar and PubMed
databases starting from 1997 to 2011 is shown in Figure 2.</p>
        <p>We can see mostly a linear growth in all the cases, with different line slope: in
Google Scholar case it is a very symbolic growth m ≈ 3,38, more significant in Web of
Knowledge m ≈ 18,28 and finally in PubMed database m ≈ 30,5. As seen from the
explanation of the queries searched in two databases, the results are not directly
comparable and are shown here to illustrate a constant raising interest of the academic
society in the topic of DM applications in medical domain.</p>
      </sec>
      <sec id="sec-3-5">
        <title>2.5. Public Interest in the Topic of DM</title>
        <p>
          Though it is difficult to estimate exact numbers, however we can get a fair
understanding using a publicly available tool Google Trends [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], which analyzes all
search queries executed in Google search engine worldwide. Google Trends service has
been collecting and mining data since 2004, providing time-series analysis, reflecting
the overall actuality of different search topics or “trends” as Google names it. The
dimensions of geographical location, source and language are taken into consideration.
We can get a better understanding by comparing the actuality of the term to other
disciplines, like artificial intelligence, machine learning, or a broader concept like
computer science. This type of analysis provides a unique source of unified data that
combines search queries in different languages from different world locations.
        </p>
        <p>Google trends is not providing absolute values on vertical axis, instead, chart’s
data is scaled to the average search traffic for ”data mining” term (represented as 1.0)
during the time period from 2004 till 2012.</p>
        <p>As shown in Figure 3, there is a general correlation among all the concepts
analyzed: artificial intelligence, machine learning and computer science. Addition to
the analysis such disciplines as “mathematics” or “physics” will show the same
correlation. A little decrease of the general public interest is noticeable in scientific or,
we should say, scholarly topics over years.</p>
        <p>Also, looking at Figure 4, which shows the trends in more detail for 2011, we can
see a decrease of interest in all topics during summer months, which is most probably a
direct indication of summer holidays in the academic society.</p>
        <p>This distribution can be interpreted in different ways. However, it offers an
additional perspective in understanding, which regions will be more active in the field
in the nearest future.</p>
        <p>Summarizing Google Trends results, we can conclude that the peak popularity of
“DM” and “artificial intelligence” concepts finished by 2007 and afterwards it remains
more or less stable.</p>
        <p>Trying to find correlations between DM actuality in the academic world and
general public would not be correct because the provided trends reflect different
enquiries: scientific result creation versus generic interest in the topic without any
obligation or intention to create any sort of result out of it.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Surveying DM Applications in Healthcare Facilities</title>
      <p>As shown above, the volume of medical related DM research increases from year to
year. Naturally, one can suppose that DM usage penetration is increasing accordingly.
However, a data analyst working in the field will agree that a large number of research
studies remains academic and has no clinical follow up and even rarely goes beyond
the institutions which were directly involved in the research. And this already generates
reasonable questioning and doubts on the rational and measurable outcome of the
research effort. Undoubtedly, a number of proven DM niche applications counts award
winning successes, like in radiology imaging or genetics analysis. But this cannot be
said about hundreds of specific clinical DM research. Up till our research date, we
could not find an example of a systematic approach in an attempt to understand history
or the current situation of DM utilization by healthcare institutions. And that can be
considered as a blurring factor, preventing the scientific society from concentration on
correct ways of the knowledge discovery process tailored for healthcare, which would
score the maximum benefit for the clinicians as end users of DM tools and methods and
finally patients as beneficiaries and final added value recipients.</p>
      <p>
        Due to the fact that the healthcare sector is very diverse and its entities as well as
actors have different objectives and fields of activities, they, employ different methods
and tools in their operations [
        <xref ref-type="bibr" rid="ref1 ref11 ref13 ref14 ref15 ref16 ref2">1, 2, 11, 13, 14, 15, 16</xref>
        ]. Therefore it was initially agreed
to define the scope of this research as DM applications in the healthcare providers’
institutions. However, for this scope the statistically valid representation would require
a significant number of different type of institutions like General Practitioners offices,
private or public clinics, local and specialized hospitals, regional and, finally, tertiary
hospitals. The initial experience of interviewing healthcare institutions suggested that
the highest probability of DM usage will score in tertiary hospitals, which have tight
relations with the academic society and participate in different sorts of scientific and
commercial research on a regular basis. That does not lead to the conclusion that DM
applications in smaller institutions do not occur, but focusing on tertiary hospitals
allowed us to estimate the upper range of DM penetration into healthcare providers
sector. The next important surveying scope constraint is geographical spread of
healthcare institutions (HCI). Setting the initial research objective, to understand the
practical usage of DM techniques and tools in HCI across the globe puts a very
ambitious but unrealistic target to survey thousands of healthcare facilities. It has been
decided, that for a limited resources study we should select tertiary hospitals at least
from different zones of economic development, having a different magnitude of
electronically available patient related data for further analyses. Therefore it was
extremely important to select well financed hospitals from the leading economies
countries as well as relatively modestly financed hospitals from the developing
countries. Hospitals from the following countries participated in the survey: South
African Republic, Lithuania, Switzerland, Albania, and Germany. This survey cannot
be treated as final as we plan to continue gathering information in upcoming years;
however it reveals clear patterns which lead us to concrete conclusions and
summarizations that might be useful for both communities of data analysts and
clinicians.
      </p>
      <sec id="sec-4-1">
        <title>3.1. Preparation and Conducting the Survey</title>
        <p>Already in early stages it has become obvious, that there is a huge gap in understanding
of DM concept by its intended end users - clinicians. Typically hospital's IT department
has knowledge and is able to describe how DM is used in the hospital. On the contrary,
medical personnel are usually minimally informed or knowledgeable about what
exactly DM is and more specifically, how it is used in the hospital. Accordingly, we are
in a situation when we cannot ignore either the first class of respondents or the second
one. And it was important to get both types of answers for later analysis. Afterwards,
we have analyzed the answers classifying both respondents’ classes separately and
summing them up together. Taking this diverse interviewing audience into
consideration, questions were formulated in a comprehensible way for a broader range
of respondents with a medical or IT background. See the summarized questions below.
1.
2.
3.
4.
5.
6.
7.
8.</p>
        <p>Usage of statistical data analyses, DM and clinical decision support systems:
Have you heard about practical applications of DM in medicine?
Do you know any research projects in your hospital using DM methods?
Have you or your colleagues been involved in DM research project, aiming to
identify new patterns or finding new rules for patient diagnostics, prediction of
treatment results or other. If yes, please provide a brief summary of research aim
and the results.</p>
        <p>If DM methods have been used, was your experience successful? Please comment
Has the clinical decision support IT system been used in your hospital?
Please specify which clinical specialties could benefit by using DM methods on
collected patient clinical data in your hospital (choose from the list)
What type of clinical research your hospital is involved in?
Are you or your colleagues potentially interested in the benefits which DM could
provide to you?</p>
        <sec id="sec-4-1-1">
          <title>Availability of Electronic Patient Data for Research:</title>
          <p>9. How many years have the patient data been collected in IT systems in your
organization?
10. Please specify what clinical patient information is stored in IT systems (HIS, EHR,
EMR, RIS, etc.). Select from the list: Observations, Lab results, Radiology reports,
Anamnesis, Surgery reports, Discharge summary, Visit summary, Nursing data
(vitals), Medication used (for inpatients).
11. Mark medical IT systems used in your organization. Select from the list: EMR /
EPR, HIS, RIS/PACS, LIS, Specific clinical information systems, Emergency IS,
OP clinic information system, Blood bank information system, Clinical decision
support system, Pathology information system.
12. Specify what standard nomenclature is used in your organization (e. g. ICD9,
ICD10, SNOMED-CD, LOINC). Select from the list: Patient diagnosis, Pathologic
diagnosis, Procedure coding, Laboratory coding.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Interest in DM: 13. Are you interested in international clinical DM research projects? 14. Specify the clinical specialty or problem you are interested in.</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Method of Survey</title>
        <p>
          The survey was conducted according to methodical guidelines of the Centre for Health
Promotion of University of Toronto [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. A call for survey was openly published in the
eHealth news portal eHealthServer.com [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. The survey was prepared in an online
questionnaire and offline forms. The need for an offline version was pointed by some
institutions with a limited or no internet access. Hospitals were asked that at least 2
respondents from each institution should fill the questionnaire; a person in charge for
medical services, e.g. medical superintendent, director of medicine, head of the clinical
department and a person in charge for Information Technology e. g. chief of the IT
department. In parallel direct enquiries were sent to the officials of hospitals in 8
countries. Complete interviewing took five months instead of two months planned due
to very little or no reaction from the respondents, especially from medical
representatives.
        </p>
        <p>The survey’s questions allowed crosschecking correctness of the information
provided. E. g. question #4 asks explicitly if DM tools are used and question #5 asks if
a clinical decisions support system is in use. Typically a clinical decision support
system would incorporate a few DM algorithms as well as statistics.</p>
        <p>The aim of questions in the section “Usage of statistical data analyses, DM and
clinical decision support systems” is to clarify the eligibility of the institution for DM,
awareness of DM concept and known applications of DM.</p>
        <p>Questions in the section “Availability of Electronic Patient Data for Research”
help to figure out the potential of DM in the institution, based on the amount of
electronically available data, the medical information system, and standardized medical
nomenclature being used.</p>
        <p>The aim of questions in the section “Interest in DM” is to define what the interest
of the respondents is in possible future DM research projects.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Analysis of Survey Data</title>
      <p>Out of 14 respondents 12 have confirmed that they had heard about practical
applications of DM. However, after the quality validation and answer crosschecking,
only 9 positive answers could be qualified. But even out of the remaining 9 respondents
with positive answers only 4 are familiar with practical examples of such usage,
making up 29% of the whole. Another aspect provided by data validation, is that the
majority of medical respondents would have no information about DM research
initiatives and applications in their own facilities. It is difficult to specify the overall
level of awareness in terms of this survey; however, the selected method of surveying 2
and more representatives from each facility has proved that typically medical
specialists, not related to the DM project in their own HCI, have no information about it.</p>
      <p>Summarized answers, grouped by hospitals with alike economical situation are
provided in 1 table. Country groups are represented as follows:
•
•
•
developing countries – Albania;
emerging countries – Lithuania and South African Republic;
western countries - Switzerland and Germany.</p>
      <p>Evaluating the benefits of gained DM experience, 50% of respondents, who
declared a personal involvement in DM projects, were satisfied with the results
achieved and 50% had a neutral opinion on the project success.</p>
      <p>Analysis which clinical specialties have the highest potential in DM usage was not
successful. Validation of answers has showed that typically all the selected clinical
specialties were relevant either to the clinical profile of the respondent or to the clinical
profile of the hospital. Summarizing the answers provided, we can conclude, that all
clinical specialties without an exception have a potential for DM.</p>
      <p>The interest in additional information on potential DM benefits was expressed by
86% of respondents, regardless of their initial experience with DM.</p>
      <p>The analysis of electronic data availability for DM purposes showed us a
correlation between depicted years of clinical data collection in a facility with the level
of the region's economic development (Figure 6). Data collection timeframe values
spread from 1 to 15 years, with the mean value 8 years and median value 4 years. In
terms of medical IT systems being used, 100% of respondents have defined that
hospital information system is in use; electronic medical record systems are used in
60% of facilities and radiology imaging systems in 83%.</p>
      <p>The usage of standard terminology dictionaries varies depending on the originating
country of the facility. The usage of ICD 9 and ICD 10 is very common for coding
disease diagnoses. However, other nomenclatures, critical for DM applications and
used to code procedure/intervention, laboratory tests, pathology diagnoses, are only
partly implemented and at a different quality level.</p>
      <p>93% of respondents expressed their will to participate in international clinical DM
research projects.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Survey Findings</title>
      <p>As it was presumed, understanding of DM as a concept as well as its potential depends
on the background of the respondents. IT personnel of a hospital typically are well
informed on the DM related research and usage inside of the hospital, scoring 100% of
its surveyed IT department representatives. In addition, clinicians usually informed
only if they were directly involved in such projects.</p>
      <p>All the respondents have confirmed that they had heard about practical
applications of DM in medicine. However, only 29% of respondents were able to
provide any example of practical DM usage.</p>
      <p>There is a noticeable confusion in differentiating DM and statistics concepts
among healthcare professionals, and very rarely DM is treated by them as a practically
valuable tool for clinical purposes.</p>
      <p>The respondents from healthcare facilities with a relatively recent adoption of IT in
the patient treatment process tend to mix statistical reporting and DM, hospital
information systems, Electronic medical record systems and decision support systems.</p>
      <p>Regardless of understanding and experience of DM, 86% of respondents expressed
their interest in the DM topic and 93% would like to participate in international DM
research projects as well as to be informed about utilization of DM techniques in the
future.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusions</title>
      <p>The analysis of publications in the field of DM application in the medical domain has
shown a steady growth since its accountable beginning till nowadays. The line slope of
publications growth can be averaged to m ≈ 17 on the search conducted in PubMed,
Web of Science, and Google Scholar databases. In the early 90’ties up to 5 publications
were produced during one year and around 400 publications in 2011. We can conclude
that a tremendous growth of interest and scientific advancement took place in the last
decade.</p>
      <p>On the DM value chain’s side, survey revealed, that the greatest part of medical
community of tertiary hospitals have either minimal or zero awareness of the DM
practical usage and its potential possibilities. All the respondents from the largest
university hospitals confirmed to be familiar with DM applications in healthcare,
however only 29% of them were able to provide any example of practical DM usage. A
huge gap in awareness and understanding of the DM potential was encountered even
inside healthcare facilities splitting IT and the clinical personnel to different poles. If
we interpolate these results to the smaller, less financed and less exposed to research
projects healthcare providers, DM usage will be significantly lower. The survey
identified a considerable potential for a further DM penetration due to an increasing
amount of patient clinical data collected in HCI and interest declared by hospitals’
clinical representatives: 86% of respondents expressed their interest in DM and even
more would like to participate in international DM research projects.</p>
      <p>However, the process of information digitalization in the developing countries is
still in the early phases and the lack of electronically available data is a stopping factor
for the spread of DM in a poor economic area.</p>
      <p>
        Summarizing, we have showed that data mining perception and practical
applications in healthcare is a way beyond its steady growth in the academic research
field, which raises a hypothesis, that relatively a little percentage of academic research
effort results in practical DM applications in healthcare, out of which we can conclude
that the current interdisciplinary approach is not efficient enough. When considering
the potential and benefits of knowledge discovery using DM tools in healthcare, it is
clear that more attention should be paid to the domain specific problems of successful
DM application in healthcare [
        <xref ref-type="bibr" rid="ref10 ref4 ref5">4, 5, 10</xref>
        ], emphasizing the usage of DM methods with
self-explanatory models [
        <xref ref-type="bibr" rid="ref16 ref7 ref9">7, 9, 16</xref>
        ] in contrast to black-box methods.
      </p>
      <p>Further research will be continued aiming to collect additional survey data from
the USA, Middle East, Asia, and Australia to increase data representation and get more
accurate results.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bellazzi</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Zupan</surname>
          </string-name>
          ,
          <article-title>Predictive data mining in clinical medicine: current issues and guidelines</article-title>
          .
          <source>International Journal of Medical Informatics</source>
          <volume>77</volume>
          (
          <year>2008</year>
          ),
          <fpage>81</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Berka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rauch</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Zighed</surname>
          </string-name>
          ,
          <article-title>Data mining and Medical Knowledge Management- Cases and Applications</article-title>
          . Idea Group Inc (IGI),
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>O.</given-names>
            <surname>Bodenreider</surname>
          </string-name>
          ,
          <article-title>Ontologies for mining biomedical data</article-title>
          .
          <source>In: IEEE International Conference on Bioinformatics and Biomedicine</source>
          , Philadelphia, Pennsylvania,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fuller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Friedman</surname>
          </string-name>
          , and W. Hersh, editors,
          <source>Medical Informatics: Knowledge Management and Data Mining in Biomedicine</source>
          , Springer Science,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K. J.</given-names>
            <surname>Cios</surname>
          </string-name>
          and
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <article-title>Uniqueness of medical data mining</article-title>
          ,
          <source>Artificial Intelligence in Medicine</source>
          <volume>26</volume>
          (
          <year>2002</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Castro</surname>
          </string-name>
          , Explaining International IT Application Leadership:
          <source>Health IT. The Information Technology @ Innovation Foundation</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Dzemyda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kurasova</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Medvedev</surname>
          </string-name>
          .
          <article-title>Dimension reduction and data visualization using neural networks emerging</article-title>
          . In: I.
          <string-name>
            <surname>Maglogiannis</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Karpouzis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Wallace</surname>
          </string-name>
          , J. Soldatos, editors,
          <source>Artificial Intelligence Applications in Computer Engineering</source>
          <volume>160</volume>
          (
          <year>2007</year>
          ), IOS Press,
          <fpage>25</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H. C.</given-names>
            <surname>Kob</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Tan</surname>
          </string-name>
          .
          <article-title>Data mining applications in healthcare</article-title>
          ,
          <source>Journal of Healthcare Information Management</source>
          <volume>19</volume>
          (
          <issue>2</issue>
          ) (
          <year>2005</year>
          ),
          <fpage>64</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>The magical number seven, plus or minus two: some limits on our capacity for processing information</article-title>
          ,
          <source>The Psychological Review</source>
          <volume>63</volume>
          (
          <year>1959</year>
          ),
          <fpage>81</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>P. M. Pardalos</surname>
            ,
            <given-names>V. L.</given-names>
          </string-name>
          <string-name>
            <surname>Boginski</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A</surname>
          </string-name>
          . Vazacopoulos, editors,
          <source>Data Mining in Biomedicine</source>
          , Springer Science,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ruben</surname>
          </string-name>
          and Jr. Canlas,
          <article-title>Data mining in Healthcare: Current Applications</article-title>
          and Issues, Thesis, Carnegie Mellon University, Australia,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Stroetmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Artmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Stroetmann</surname>
          </string-name>
          .
          <source>European Countries on their Journey Towards National eHealth Infrastructures. Final European Progress Report. European Commission, DG Information Society and Media, ICT for Health Unit</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W.</given-names>
            <surname>Stühlinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hogl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Stoyan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Müller</surname>
          </string-name>
          .
          <article-title>Intelligent data mining for medical quality management</article-title>
          .
          <source>In: Workshop Notes of the 14th European Conference Artificial Intelligence</source>
          ,
          <year>2000</year>
          ,
          <fpage>55</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>V.</given-names>
            <surname>Špečkauskienė</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Lukoševičius</surname>
          </string-name>
          .
          <article-title>Methodology of adaptation of data mining methods for medical decision support: case study</article-title>
          ,
          <source>Electronics and Electrical Engineering</source>
          <volume>2</volume>
          (
          <issue>90</issue>
          ) (
          <year>2009</year>
          ),
          <fpage>25</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Treigys</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Šaltenis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Dzemyda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Barzdžiukas</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Paunksnis</surname>
          </string-name>
          ,
          <source>Automated optic nerve disc parameterization, Informatica</source>
          <volume>19</volume>
          (
          <issue>3</issue>
          ) (
          <year>2008</year>
          ),
          <fpage>403</fpage>
          -
          <lpage>420</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bhatnagar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <article-title>The impact of data mining techniques on medical diagnostics</article-title>
          ,
          <source>Data Science Journal</source>
          <volume>5</volume>
          (
          <year>2006</year>
          )
          <fpage>119</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wilson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Thabane</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Holbrook</surname>
          </string-name>
          .
          <article-title>Application of DM techniques in pharmacovigilance</article-title>
          ,
          <source>British Journal of Clinical Pharmacology</source>
          <volume>57</volume>
          (
          <issue>2</issue>
          ), (
          <year>2003</year>
          ),
          <fpage>127</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Wulff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Pedersen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosenberg</surname>
          </string-name>
          ,
          <source>Philosophy of Medicine an Introduction, Blackwell Scientific Publications</source>
          , Oxford,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>[19] Healthcare Information and Management Systems Society. Electronic Health Records. A Global Perspective</source>
          .
          <article-title>White paper</article-title>
          .
          <source>HIMSS Enterprise Systems Steering Committee and the Global Enterprise Task Force</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>[20] Division of Health Care Statistics. NCHS Health E-Stat Report. National Center for Health Statistics of US</source>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Workbook</surname>
          </string-name>
          . Conducting Survey Research. Centre for health Promotion of University of Toronto,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Google</surname>
            <given-names>Scholar -</given-names>
          </string-name>
          <article-title>Web search engine, indexes scholarly literature</article-title>
          . Available from: http://scholar.google.com/.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Google</given-names>
            <surname>Trends</surname>
          </string-name>
          .
          <article-title>Web portal</article-title>
          . Available from: http://www.google.com/trends/.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <article-title>National Library of Medicine - MeSH</article-title>
          . Available from: http://www.nlm.nih.gov/mesh/meshhome.html.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <fpage>PubMed</fpage>
          - Database
          <source>of References and Abstracts on Life Sciences and Biomedical Topics</source>
          . Available from: http://www.ncbi.nlm.nih.gov/pubmed/.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>[26] Web of Science - Academic Citation Index Provided by Thomson Reuters. Available from: http://apps.isiknowledge.com/</mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <article-title>eHealth Server news portal</article-title>
          . Available from: http://www.ehealthserver.com/research-anddevelopment/935-survey-on
          <article-title>-application-of-data-mining-to-support-clinical-decisions.</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>