=Paper=
{{Paper
|id=None
|storemode=property
|title=Process Model for Data Mining in Health Care Sector
|pdfUrl=https://ceur-ws.org/Vol-729/paper3.pdf
|volume=Vol-729
}}
==Process Model for Data Mining in Health Care Sector==
<pdf width="1500px">https://ceur-ws.org/Vol-729/paper3.pdf</pdf>
<pre>
        Process model for data mining in health care sector

           Diego Roa - María del Pilar Villamil                          Juan Diego Arboleda Oracle
                        Los Andes University                                      Bogota,Colombia
                          Bogota,Colombia                           juan.arboleda.tabares@oracle.com
         {df.roa34,mavillam}@uniandes.edu.co


ABSTRACT                                                          tics associated to these sectors that can be used to customize
This paper presents a process model to guide the data min-        the process and to improve the quality and effectiveness of
ing projects in the health care sector. The process model         these kind of projects.
(PMH) is a specialization of CRISP-DM methodology and
presents different issues associated to the data analysis and     This paper presents a process model to guide the data min-
management. This proposal was validated in order to ad-           ing process in the health care sector. This process model
dress real problems related to health care in Colombia. The       allows to reduce the costs and resources used in data min-
results show that it is possible to establish new hypothe-        ing projects. The process model was evaluated by analyzing
sis about the clinical data, and revalidate these affirmations    49,000,000 individual register of health care (RIPS) obtained
using the proposed process model.                                 from different sources, such as HMOs and the Minister of
                                                                  Social Protection in Colombia from 2003 to 2006. The eval-
Categories and Subject Descriptors                                uation goal was to compare treatments among Health Main-
                                                                  tenance Organizations (HMOs), as well as verifying the com-
H.4 [Information Systems Applications]: healthcare sec-
                                                                  pliance to standards of evidence gained from the scientific
tor; D.2.8 [Data mining]: Process
                                                                  method (EBM- Evidence-based medicine); which will sup-
                                                                  port the quality process of health services. The analysis
Keywords                                                          was focused on hypertension diagnosis and allows to evi-
data mining, process model, healthcare, PMH                       dence similarities between the national guidelines and the
                                                                  health service. Specifically it was possible to identify the
1.   INTRODUCTION                                                 use of captopril an Angiotensin-Converter enzyme inhibitor
The knowledge discovery process is one of the main factors        medicament in the hypertension treatment. This medica-
to enhance competitiveness in organizations. The use of           ment is cheaper according to other medicines of this type.
data mining techniques in this process is essential to analyse,   One hypothesis is that it is pre-scripted for economical rea-
understand and predict behaviours of an organization.             sons. However, patients with this kind of treatment, returns
                                                                  to the healthcare institution with complications in the hy-
In the health care sector, there are many opportunities to        pertension disease. This kind of complications increase the
apply data mining. Some of them are related to the im-            illness costs. Finally, validations about this process model
provement of the quality control in health care. In particu-      enable new opportunities to establish public health policies
lar, analysis to detect and diagnose diseases, predict the re-    in Colombia.
sponses of the organism to specific treatments and to identify
epidemiological profiles, are relevant themes for the health      This paper is organized as follows. Section 2 describes prob-
care community.                                                   lems related to the data mining process in healthcare. Sec-
                                                                  tion 3 presents Health care data management characteristics
There are many methodologies to tackle data mining oppor-         and the PMH the Process Model proposed for data mining in
tunities such as CRISP-DM[2] or the virtuous cycle of data        Healthcare sector. Section 4 exposes the validation method
mining [8]. All of them are designed to improve the suc-          of the proposed process. Finally, Section 5 concludes the
cess of data mining projects. These methodologies are used        paper and presents other research issues.
in many sectors such as financial, pharmaceutical or health
care industries. However, there are very specific characteris-
                                                                  2.   DATA MINING AND HEALTH CARE
                                                                  There are many studies that evidence the relevance of data
                                                                  mining techniques in the health care sector. These studies
                                                                  are associated to the treatment of patients and generally,
                                                                  to the identification of best practices in the treatments of
                                                                  specific diseases.

                                                                  Some works such as [6, 16, 17, 15, 9] show different categories
                                                                  of problems related to the health care sector that are solved
                                                                  using data mining techniques. Some of them presents the use
of association rules, sequential patterns or clustering in the     3.2   PMH overview
prediction, analysis and monitoring of patient’s treatments.       The knowledge in data mining projects frequently remains
On the other hand, there are studies such as [1, 7] that           in few people like consultants and experts in specific do-
propose new algorithms to solve these issues.                      mains. For this reason, the experiences and processes can-
                                                                   not be reusable and applied to similar projects in health
Abidi and Stolba in [14, 10] describe the relevance of iden-       care. Furthermore, it is necessary to know different guide-
tifying clinical guides based on the individual registers of       lines, references and standards related to quality of service,
health care. These guides support medical tasks, increas-          with the purpose of understand the main characteristics of
ing the quality of service of medical centers. However, these      the health care domain.
works are focuses in structuring clinical guidelines, and they
do not emphasize in the methodology used to realize these          As a result, this paper presents PMH (Process Model for the
projects.                                                          health care context). This process is an specialization of the
                                                                   CRISP-DM methodology proposed in Colombian health care
Although some of the works such as [5] use the CRISP-DM            context, based on the verification carried out in the phar-
methodology and others [11] the virtuous cycle data mining         macological and non-pharmacological treatments in Colom-
to improve the process quality, there are some characteris-        bia’s health care institutions (IPS), through the application
tics associated to the specific domain that will be used to        of data mining techniques on RIPS files. This process allows
reduce the number of incidentals that may arise in a data          to reduce time and resources with respect to develop mining
mining project. These issues enhance the opportunity to            projects in this domain without a specific knowledge.
use methodologies that explicitly include clinical concepts,
problems related to the health care sector and moreover,           This guide proposes seven steps in an iterative way tacking
that supports the selection process of data mining techniques      into account health care context. The following paragraphs
based on the specific characteristics of this sector.              provide a description about the different steps involved in
                                                                   this process model. The numbers used in this description
The afore mentioned issues motivate the realization of PMH,        correspond to the number used in the figure 1.
a specific process model for health care context, that will be
presented in the section 3.


3.    DATA MINING GUIDE FROM THE HEALTH
      CARE POINT OF VIEW
This section presents a brief description about health care
context in subsection 3.1, with the purpose to highlight the
opportunity to provide new guides associated to data mining
applications to improve different kind of projects in health
care. Subsection 3.2 describes a new process model to guide
these projects.


3.1    Health care data management
The health care sector evidences challenges w.r.t. data man-
agement because of data characteristics such as volume, qual-
ity, availability, accessibility, and the relevance of the deci-
sions involved during the process analysis.

Moreover, there are clinical guidelines that describe a set of
steps to treat a specific disease. From these guides, it is pos-
                                                                   Figure 1: Data mining methodology for health care
sible to determine the service efficiency, the time involved in
                                                                   sector
the treatment, and typical practices such as treatments and
medications. This kind of information provides important
elements to the decision’s maker.                                  I. Scope Definition of the exercise. This step allows to
                                                                   define the business problem to be analysed. Several funda-
According to the volume of data analyzed, it is important to       mental aspects must be clarified in this step: what, how, and
highlight that all clinical cases are relevant during process      why the assessment is done, as well as defining the criteria
analysis. This is contrary to other sectors. In other domains      for the success of the exercise. Some of the typical ques-
a rule is meaningful when the support of the data is relatively    tions proposed by experts are focused on problems to vali-
high, whereas in the health care domain, analysis involving        date the effectiveness of pharmacological treatments in the
mortalities will be accounted for although the number of           emergency room according to IPS’s best practices, others in
cases will not be significant, statistically speaking.             the control and monitoring of chronic diseases according to
                                                                   the standards. In this step, these questions will be contex-
The ideas mentioned before motivates the development of            tualized according to the service(hospitalizations, urgencies,
the process model for healthcare (PMH), which is described         procedures or medical appointments), and the diagnosis to
in the next sections.                                              be monitoring.
II. Selection of the reference guide. This step II con-            and availability. These issues will be tacking into account to
sists on the selection of the reference guide(s) to evaluate the   decide the use of these sources during the analysis process.
question proposed in the first step; for example the medi-
cal treatment used for a diagnosis. Currently, it is possible      In the healthcare sector, there are different sources that can
to use the expert advise to validate the quality of service        be obtained and used for the development of data mining
in health care. On the other hand, there is specific manu-         projects.
als such as standards, protocols, and clinical guidelines pro-
posed by governments and organizations that can be used            Generally, countries have an individual healthcare register
as a reference guide.                                              corresponding to every hospitalization, urgency or proce-
                                                                   dure associated to a patient. For example, the United States
The standards are evidence based on references used to eval-       has the Electronic Health Record (EHR) which includes de-
uate the quality and performance of services, while protocols      mographics, medical history, medication and allergies, im-
are documents that describe the rules of action depending          munization status and observations (among others). On
on a specific circumstance [4]. Usually, protocols are spe-        the other hand, Colombia has the Individuals Registers of
cific documentation defined by each IPS. Also, the clinical        Health Care (RIPS in Spanish), that provides information
guidelines are systematically developed statements to assist       related to the delivery of health services and demographic
practitioners and patient decisions about appropriate health       variables.
care for specific clinical circumstances [12].
                                                                   There are other kind of information sources related to na-
About clinical guidelines, the World Health Organization           tional behavior such as national survey or naming standards.
(WHO) presents different guides related to the diagnosis,          Some national surveys contains demographic and health in-
treatment and prevention of specific diseases such as obesity,     formation that can be used to support the data mining pro-
malnutrition or diabetes. Furthermore, different countries         cess. Furthermore, the WHO defines the CIE10 standard.
have established national standards to treat a disease. For        This standard defines the classification and organization of
example, Colombia has the 412 resolution which suggests            diseases based on a unique code that represents the category
the set of activities and procedures that should be used in        and the specific affection.
public health diseases [13].
                                                                   Some naming standards are associated to specific health in-
Although clinical guidelines vary in content, they have es-        terventions available for each country. For example, the Aus-
sentially the following structure:                                 tralian Classification of Health Interventions (ACHI) con-
                                                                   tains all the procedures that are realized by HMOs in the
 Clinical guideline structure                                      country. The Unique Procedures Classification in healthcare
 0. Authors                                                        (CUPS in Spanish) is the Colombian classification system for
 1. Introduction                                                   this information.
 2. Disease detection
 3. Diagnosis                                                      IV. Selection and preparation of healthcare informa-
 4. Classification and Tracking                                    tion. It is necessary to have the support of health experts
 5. Disease evaluation                                             to select the information that is highly relevant to solve the
 6. Non-pharmacological treatment                                  proposed problem in step I. Each disease presents different
 7. Pharmacological treatment                                      characteristics. For example, there are diseases like prostate
                                                                   cancer or pre-eclampsia in which sex is not a determining
 8. Disease complications
                                                                   factor. The first affects men, and the latter applies only
 9. Disease special situations
                                                                   for pregnant women. In chronic diseases like hypertension,
 10. Hospital treatment
                                                                   time is a significant variable. Its treatment is based on mon-
 11. Emergency treatment                                           itoring the patient periodically with formulated procedures
 12. Clinical guidelines future review recommendations             and medicines according to a specific order and to patient’s
 13. Bibliography                                                  evolution over time. On contrary, appendicitis treatment is
                                                                   considered relatively short. In this case the time variable
In the structure above, the interest lies (mainly) in points 4     is not relevant. These considerations must be taken into
to 11. The quality control proposed is based on the compar-        account in selection step.
ison between treatments with a specific admission diagnosis
and an established clinical guide diagnosis.                       The selected information follow a data cleaning process.
                                                                   Generally, health information has problems related to data
The suggestion is to choose the reference guide that has been      management such as replication of records. Moreover, medicines
established in the national policies or regulations, because       data management proposed a new challenge to data mining
international clinical guidelines may not have validity in a       experts. Usually, this data does not have a standard to be
specific country, or may not be applied in certain IPS be-         filled. For example, the medicine ”amoxycillin” can be filled
cause of socio-economic or epidemiological factors.                as ”amox” or ”amoxycilin”. In these cases, the similarity
                                                                   word analysis can be used to solve the issue.
III. Identification of information sources. This step
consists of the identification of useful information sources       The corresponding discretization of continuous data and the
according to the scope of the project and the selected refer-      standardization of information must be made, necessary pro-
ence guides. This identification depends on the data quality       cedures for the execution of mining algorithms, which should
be in line with own business rules of the selected diagnosis.      specific characteristics can be found. Moreover, In the con-
For example, in the case of Alzheimer’s disease, age cat-          text of quality control, the association rules allow us to dis-
egories should be created after 40 years, being consistent         cover rules that may or may not correspond to established
with the characteristics of vulnerable populations, and the        treatments in clinical guidelines It is recommendable to use
evolution of the disease over time. Different from Appendici-      this technique in the diagnosis of acute illness. This kind
tis disease, where age ranges should be used much broader,         of illness usually are applied during a patient’s admission to
since it is a disease that can occur at any age. The com-          the IPS and does not require periodic monitoring to ensure
plexity of both the discretization and the standardization         new procedures or drugs depending on the patient progress.
of data may depend largely on the amount of selected in-           Furthermore, association rules could be used in problems
formation sources and the absence of the use of ontologies         that analyze the treatment’s behaviour.
for the unification and standardization of medical terms and
concepts.                                                          The possible way to do in quality control in the health care
                                                                   it is as follows:
An statistical analysis of this step is necessary for physicians
because it’s important to know the data percentage that            Two data sets are created, one for drug treatment informa-
must be cleaned and the problems that arise the datasets.          tion and one for non-drug information for each patient with
                                                                   the same admission diagnosis, and with the same method of
V. Information adjustment and preliminary analy-                   admission. It is important that the information of the ad-
sis. In this step is important to analyse the resulting dataset    mission method is not mixed, since the in each treatments
of the previous phase. This analysis concerns to identify          can be different. Thus, given an X diagnosis in hospitaliza-
the main characteristics of the data and the discovering of        tion, we have the following data set of medications:
new variables that are relevant for an specific situation. In
healthcare, variables such as hospitalization window, total
cost of treatment and patient satisfaction are relevant in
many situations. For example, to solve questions like, what
is the most expensive treatment?, which is the one with the
lowest satisfaction?, which one represents a lower rate of
days of staying? these variables are relevant in this context.

The preliminary analysis is based on descriptive statistics.
The objective is to review the statistical data distribution in
order to avoid biased results. In this step can be identified
how many men or women are involved in the dataset, how is
                                                                                 Figure 2: medication dataset
the age distribution or if the treatment is based significantly
on drugs rather than procedures. The health experts evalu-
                                                                   For this example, an association rule would be: ”In the phar-
ate the results and if necessary, this step is review again.
                                                                   macological treatment of a patient with an X diagnostic, if
                                                                   medicines M30 and M90 are provided, then M70 medicine
VI. Selection and implementation of data mining al-
                                                                   will also be provided.”
gorithms. This step consists in the identification of data
mining algorithms to achieve the objectives proposed. In
                                                                   This technique required some parameters to be customized:
healthcare, there are probable classifications of the mining
                                                                   The support and the confidence. The first one is the pro-
problems. The first, is related to the analysis of treatments,
                                                                   portion of transactions in the data set which contain the
which is feasible to predict the organism response against
                                                                   itemset:
specific procedures. Next, the monitoring and evolution
of patients in infectious and chronic disease. The latter,
                                                                   supp(A) = occurrence(A) ÷ size(dataset).
is based on verify the proper provision of health services.
For example, for the last classification there situations in
                                                                   The second one is defined as the conditional probability (
which is appropriate to know the percentage of compliance
                                                                   P (B|A)): the occurrence of A, given the occurrence of B:
of treatments with respect to a clinical guidelines or which
are the implications of meeting/failing the clinical guide-                                  S
                                                                   conf (A ⇒ B) = supp(A         B) ÷ supp(A).
lines in terms of costs. The data mining algorithms that
are proposed are association rules, sequential patterns and
                                                                   Sequential Patterns: This technique detects cause-effect
clustering techniques to solve specific problems in healthcare
                                                                   relations considering the time periods in which transactions
sector.
                                                                   occurred. In health care context, there are many situations
                                                                   that involves periodic controls and patient’s monitoring. As
Association Rules: this technique identifies the cause- ef-
                                                                   mentioned before, standards, clinical guidelines and proto-
fect relations between variables. It is possible to characterize
                                                                   cols specify sequences of treatments that should be applied
an specific group using clustering techniques, and determine
                                                                   in a explicit situation. For this reason, it is feasible to com-
the behaviour of an specific cluster using association rules.
                                                                   pare the patient’s procedures versus reference guides using
                                                                   sequential patterns. Chronic diseases like hypertension or
In healthcare, it’s interesting to discover relations among
                                                                   diabetes requires that patients return to the IPS several
events. For example, if a patient’s disease evolves to a
                                                                   times with the same diagnosis. This kind of problems in-
chronic phase, the probability to have a decease based on
                                                                   volves many variables in each time period. The recommen-
dation is to use sequential patterns to analyze the evolution
of patient’s health based on the treatments applied.

Based on the concept that, at any given time, a clinical event
is the formulation of one or more drugs or procedures we
propose the creation of two data sets, drugs and procedures
datasets. In this case the records should be grouped by
patient, ordering clinical events in ascending order by date
and time.


                                                                         Figure 5: dataset for clustering technique


                                                                   does not necessarily implies that should be discarded. It all
                                                                   depends on the clinical context that is being evaluated and
                                                                   the criteria of the medical expert.

                                                                   VII. Result Validation and Impact Analysis. This
                                                                   final steps concerns to tunning up the data mining model
Figure 3: structure of a sequential pattern dataset                based on the health expert feedback and the results achieved.
                                                                   A detailed study of the results is made by a board of medical
All clinical events from a patient arranged in order, can be       experts in the area supported by a technical group of people
seen together as a sequence, where each event corresponds to       to determine whether or not to repeat some of the above
a set of drugs or procedures. Figure 4 shows the sequences         steps, possibly with a change in strategy or range, or by a
found in previous patients.                                        refinement of the data used, to get specific conclusions.

                                                                   4.    VALIDATION
                                                                   This proposal is validated based on the PMH process and
                                                                   the product obtained by applying this steps in the solution
                                                                   of a real problem in Colombia’s health care sector.

                                                                   Our proposal is a specialization of the CRISP-DM method-
      Figure 4: sequence of patients procedures                    ology. The CRISP-DM process has been validated in several
                                                                   domains and modified during more than a decade, based on
Clustering: to found groups of elements with similar char-         the application of the process in many data mining projects.
acteristics. In healthcare, is very common to analyze pop-         In that sense, we refined the CRISP phases in order to im-
ulations based on specific characteristics. Nevertheless, it’s     prove the knowledge and special issues of the health care
possible to use this technique in the validation of right treat-   sector, reducing the time and resources that have to be used
ments to right people. The reference guides define specific        for understand this particular domain.
treatments for people with Specific demographic character-
istics. In this case, the use of clustering determines subsets     The PMH process incorporates the diagnosis and reference
based on procedures and demographic information.                   guide selection in the business understanding phase of CRISP-
                                                                   DM, and presents the selection criteria for these steps. Fur-
In addition, there are situations in which data quality is         thermore, in the other phases is possible to understand the
poor. For this reason, its impossible to analyze particular        principal problems associated with data quality in health
issues in the dataset because of the confidence of data. The       care, and to determine which algorithms should be used to
suggestion is to use clustering techniques to generalize the       solve several problems in this sector.
main characteristics of an specific group. To perform these
analysis, a data set must be created which includes patient        The next sections are focused on the product validation. It
information such as gender, age, marital status and race,          applies the PMH process in the quality control of Colombia’s
(among others), as well as drugs and/or procedures pro-            health care sector and validate the results based on the ex-
vided, grouped into treatments found in the previous sec-          perts criteria. The development of this exercise consists in
tion, as shown in figure 5:                                        two iterations of the process. The specifics steps of PMH
                                                                   are described below.
As in previous algorithms, it is important to analyze the
number of people supporting each cluster, before making            4.1   Problem description
any conclusions. It is also essential to understand, that a        Hypertension is a chronic disease that affects 20% of people
cluster represents a very small percentage of the population       in the world. This is considered the first cause of morbility
and the most representative disease related to cardiovascular     plemented using the IBM Intelligent Miner 8.1 data mining
affections. For this reason, the objective of this exercise was   tool.
to evaluate the pharmacological and non-pharmacological
treatment for this disease in Colombia.                           The figure 6 shows the results of the mining model.

4.2    First iteration
Step I consists on the description of the problem according to
the context, related to the diagnosis and type of disease that
is going to be tackled. In this case, it is relevant to analyze
the characteristics of hypertension. This is a chronic disease,
generally asymptomatic and it requires continuous medical
assistance. The typical complications of hypertension are
related to cardiac failures. This complications implies hos-
pitalizations, urgencies and complex procedures.                           Figure 6: Sequential patterns results

Based on the health experts support, there are different ref-     The main result is the use of captopril. More than 40% of
erence guides related to hypertension, but its treatment may      the patients were medicated with this medicine. Based on
differ from one country to another. For this reason, the se-      health experts opinions, the captopril is a medicine used in
lection of the reference guide was focused on the ”Clinical       mono-therapy treatments and is prescripted for economical
guideline for hypertension disease” proposed by the Associ-       reasons. Other important conclusion is related to data qual-
ation of Faculties of Medicine of Colombia [3].                   ity of medicines; a specific naming standarisation was used
                                                                  to resolve the problem. In the same way, several manual
As mentioned in section III, Colombia has the Individual          process was realised to mitigate replication of records prob-
Registers of Health Care. The objective of this data is re-       lem.
lated to bill the delivery of health services made by the
Colombian’s IPS. The information was collected from HMOs          4.3    Second iteration
and the Minister of Social Protection from years 2003 to          Based on the results of the first iteration, a second iteration
2006 (49,000,000 of individual register of health care). Fur-     of the steps suggested in PMH was performed. Therefore,
thermore,the RIPS data are filled based on the CIE10 and          the health experts proposed to analyze if the health system
CUPS standards. For this reason, these information is also        may incur in higher costs because of the prescription of cap-
taken into account.                                               topril to the patients. Using the RIPS data, we want to
                                                                  determine the complications related to these patients.
According to the expert opinion, the relevant information
for this analysis is described in table 2:                        According to the PMH, it is necessary to prepare the data
                                                                  that will be used by the model. For this reason, the data
                                                                  presents demographic information and relevant aspects about
                  Variables                                       the evolution of a patient’s disease.
                  Principal diagnosis
                  IPS Identification                              In this case, new variables have to be include based on the
                  Date                                            health expert recommendation. This variables are associ-
                  Sex                                             ated to the evolution to a chronic phase in the clinical his-
                  Department                                      tory of patients. For this reason, were introduced the date
                  Type of medical service                         of the patient’s complication, the associated diagnosis, the
                  Medicine                                        number of hospitalizations or procedures before and after the
                  Procedures                                      complication of hypertension. The next step of the method-
                  Length of stay                                  ology, proposes a preliminary analysis of the information. In
                  decease diagnosis                               this case, the information consists on all the records of the
                  related diagnosis                               patients that have suffered this disease.

                                                                  To describe the complication of a patient, the sequential
The statistical analysis results for the dataset shows that       clustering technique was used. As described in section 3,with
a 77% of the registers correspond to procedures, 17% are          this technique it is possible to find clusters of patients with
medical appointments and a 6% of the data to medicines            similar sequences and characteristics.
prescriptions. Moreover, a 67% correspond to men and a
33% to women. On the other hand, a 16% of the patients            The results of the second iteration shows that patients with
return to the IPS for health controls.                            similar sequences associated to the use of captopril have
                                                                  complications such as chronic cardiac failures, hypertensive
In the pharmacological treatment of hypertension, we want         crisis or heart attacks.
to analyze the most relevant medication sequences. In this
case, the time variable is highly relevant because we want        In general, it is important to analyze that the use of cap-
to trace the prescription of medicines for a specific disease.    topril is pre-scripted for economical reasons. This strategy
For this reason, we used sequential patterns, according to        is useful in a short term period, but in a long term, we can
the ideas presented in subsection 3.2. Our model was im-          observe that patients with this kind of treatment, returns
to the healthcare institution with complications in the hy-      [10] S. N. and T. A. M. The relevance of data warehousing
pertension disease. This kind of complications increase the           and data mining in the field of evidence-based
illness costs.                                                        medicine to support healthcare decision making. 2002.
                                                                 [11] L. Nevine and M. Malek. Data mining for cancer
5.   CONCLUSIONS AND FUTURE WORK                                      management in egypt case study: Childhood acute
This paper proposes a process model to guide the data min-            lymphoblastic leukemia. 2005.
ing process in the health care sector. It suggests a set of      [12] N. I. of Health. About clinical practice guidelines.
iterative and facultative steps to improve the results of the    [13] M. of Social Protection. Resolución 412.
mining process. This process model was evaluated using the            http://mps.minproteccionsocial.gov.co/pars/caja-
analysis of quality of service for the treatment of hyperten-         herram/documentos/Biblioteca/CompendioNormativo/
sion in Colombia. The results shows that it is possible to            resolucion 412 00.pdf, 2002.
establish new hypothesis about the datasets, and revalidate      [14] A. S. S. Raza and A. S. Raza. A case for
this affirmations using the proposed process model. At the            supplementing evidence base medicine with inductive
same way, these results evidence some facilities provided to          clinical knowledge: Towards a technology-enriched
the data mining expert to guide their process, specially as-          integrated clinical evidence system. In Proceedings of
sociated to the knowledge about healthcare context such as            the Fourteenth IEEE Symposium on Computer-Based
data sources, reference guides and data mining techniques.            Medical Systems, CBMS ’01, pages 5–, Washington,
                                                                      DC, USA, 2001. IEEE Computer Society.
An exhaustive validation of the process model is considered      [15] N. R. T. and P. Jian. Introduction to the special issue
as future work, in terms of a formal comparison between               on data mining for health informatics. SIGKDD
the use of CRISP-DM and PMH. At the same time, new                    Explor. Newsl., 9:1–2, June 2007.
kind of question from the expert point of view will be in-       [16] van Driel M. A., C. K., K. P. P., L. J. A., and B. H.
teresting to resolve using this process model. In particular,         G. A new web-based data mining tool for the
the identification of an epidemiological profile for Colombian        identification of candidate genes for human genetic
population.                                                           disorders. Eur J Hum Genet, 11(1):57–63+, 2003.
                                                                 [17] C. Yu, P. L. Henning, C. W. W., and O. Jorn. Drug
6.   ACKNOWLEDGMENT                                                   exposure side effects from mining pregnancy data.
The authors appreciate the support of Jose Abasolo, profes-           SIGKDD Explor. Newsl., 9:22–29, June 2007.
sor at Los Andes University, which provides the initial ideas
and suggestions for the development of this article.

7.   REFERENCES
 [1] E. A.M. and H. Fu. Privacy preserving distributed
     learning clustering of healthcare data using
     cryptography protocols. In Computer Software and
     Applications Conference Workshops (COMPSACW),
     2010 IEEE 34th Annual, pages 140 –145, july 2010.
 [2] P. Chapman, J. Clinton, R. Kerber, T. Khabaza,
     T. Reinartz, C. Shearer, and R. Wirth. Crisp-dm 1.0
     step-by-step data mining guide. Technical report, The
     CRISP-DM consortium, August 2000.
 [3] A. colombiana de Facultades de Medicina. Guı́a clı́nica
     para la hipertensión arterial.
     http://www.redsalud.gov.cl/archivos/guiasges/
     hipertension arterial primaria.pdf.
 [4] eHow Health. Definition of clinical protocol.
 [5] X. Fang. Are you becoming a diabetic? a data mining
     approach. In Fuzzy Systems and Knowledge Discovery,
     2009. FSKD ’09. Sixth International Conference on,
     volume 5, pages 18 –22, august 2009.
 [6] K. Harleen and S. K. Wasan. Empirical study on
     application of data mining techniques in health care.
     Journal of computer science 2, pages 194–200, 2006.
 [7] A. Li, S. Wang, H. Zheng, L. Ji, and J. Wu. A novel
     abnormal ecg beats detection method. In Computer
     and Automation Engineering (ICCAE), 2010 The 2nd
     International Conference on, volume 1, pages 47 –51,
     february 2010.
 [8] B. M.J.A. and L. G.S. Mastering data mining. 2000.
 [9] L. N. Predicting the risk of future hospitalization. In
     Database and Expert Systems Applications (DEXA),
     2010 Workshop on, pages 120 –124, september 2010.

</pre>