=Paper=
{{Paper
|id=None
|storemode=property
|title=Process Model for Data Mining in Health Care Sector
|pdfUrl=https://ceur-ws.org/Vol-729/paper3.pdf
|volume=Vol-729
}}
==Process Model for Data Mining in Health Care Sector==
Process model for data mining in health care sector
Diego Roa - María del Pilar Villamil Juan Diego Arboleda Oracle
Los Andes University Bogota,Colombia
Bogota,Colombia juan.arboleda.tabares@oracle.com
{df.roa34,mavillam}@uniandes.edu.co
ABSTRACT tics associated to these sectors that can be used to customize
This paper presents a process model to guide the data min- the process and to improve the quality and effectiveness of
ing projects in the health care sector. The process model these kind of projects.
(PMH) is a specialization of CRISP-DM methodology and
presents different issues associated to the data analysis and This paper presents a process model to guide the data min-
management. This proposal was validated in order to ad- ing process in the health care sector. This process model
dress real problems related to health care in Colombia. The allows to reduce the costs and resources used in data min-
results show that it is possible to establish new hypothe- ing projects. The process model was evaluated by analyzing
sis about the clinical data, and revalidate these affirmations 49,000,000 individual register of health care (RIPS) obtained
using the proposed process model. from different sources, such as HMOs and the Minister of
Social Protection in Colombia from 2003 to 2006. The eval-
Categories and Subject Descriptors uation goal was to compare treatments among Health Main-
tenance Organizations (HMOs), as well as verifying the com-
H.4 [Information Systems Applications]: healthcare sec-
pliance to standards of evidence gained from the scientific
tor; D.2.8 [Data mining]: Process
method (EBM- Evidence-based medicine); which will sup-
port the quality process of health services. The analysis
Keywords was focused on hypertension diagnosis and allows to evi-
data mining, process model, healthcare, PMH dence similarities between the national guidelines and the
health service. Specifically it was possible to identify the
1. INTRODUCTION use of captopril an Angiotensin-Converter enzyme inhibitor
The knowledge discovery process is one of the main factors medicament in the hypertension treatment. This medica-
to enhance competitiveness in organizations. The use of ment is cheaper according to other medicines of this type.
data mining techniques in this process is essential to analyse, One hypothesis is that it is pre-scripted for economical rea-
understand and predict behaviours of an organization. sons. However, patients with this kind of treatment, returns
to the healthcare institution with complications in the hy-
In the health care sector, there are many opportunities to pertension disease. This kind of complications increase the
apply data mining. Some of them are related to the im- illness costs. Finally, validations about this process model
provement of the quality control in health care. In particu- enable new opportunities to establish public health policies
lar, analysis to detect and diagnose diseases, predict the re- in Colombia.
sponses of the organism to specific treatments and to identify
epidemiological profiles, are relevant themes for the health This paper is organized as follows. Section 2 describes prob-
care community. lems related to the data mining process in healthcare. Sec-
tion 3 presents Health care data management characteristics
There are many methodologies to tackle data mining oppor- and the PMH the Process Model proposed for data mining in
tunities such as CRISP-DM[2] or the virtuous cycle of data Healthcare sector. Section 4 exposes the validation method
mining [8]. All of them are designed to improve the suc- of the proposed process. Finally, Section 5 concludes the
cess of data mining projects. These methodologies are used paper and presents other research issues.
in many sectors such as financial, pharmaceutical or health
care industries. However, there are very specific characteris-
2. DATA MINING AND HEALTH CARE
There are many studies that evidence the relevance of data
mining techniques in the health care sector. These studies
are associated to the treatment of patients and generally,
to the identification of best practices in the treatments of
specific diseases.
Some works such as [6, 16, 17, 15, 9] show different categories
of problems related to the health care sector that are solved
using data mining techniques. Some of them presents the use
of association rules, sequential patterns or clustering in the 3.2 PMH overview
prediction, analysis and monitoring of patient’s treatments. The knowledge in data mining projects frequently remains
On the other hand, there are studies such as [1, 7] that in few people like consultants and experts in specific do-
propose new algorithms to solve these issues. mains. For this reason, the experiences and processes can-
not be reusable and applied to similar projects in health
Abidi and Stolba in [14, 10] describe the relevance of iden- care. Furthermore, it is necessary to know different guide-
tifying clinical guides based on the individual registers of lines, references and standards related to quality of service,
health care. These guides support medical tasks, increas- with the purpose of understand the main characteristics of
ing the quality of service of medical centers. However, these the health care domain.
works are focuses in structuring clinical guidelines, and they
do not emphasize in the methodology used to realize these As a result, this paper presents PMH (Process Model for the
projects. health care context). This process is an specialization of the
CRISP-DM methodology proposed in Colombian health care
Although some of the works such as [5] use the CRISP-DM context, based on the verification carried out in the phar-
methodology and others [11] the virtuous cycle data mining macological and non-pharmacological treatments in Colom-
to improve the process quality, there are some characteris- bia’s health care institutions (IPS), through the application
tics associated to the specific domain that will be used to of data mining techniques on RIPS files. This process allows
reduce the number of incidentals that may arise in a data to reduce time and resources with respect to develop mining
mining project. These issues enhance the opportunity to projects in this domain without a specific knowledge.
use methodologies that explicitly include clinical concepts,
problems related to the health care sector and moreover, This guide proposes seven steps in an iterative way tacking
that supports the selection process of data mining techniques into account health care context. The following paragraphs
based on the specific characteristics of this sector. provide a description about the different steps involved in
this process model. The numbers used in this description
The afore mentioned issues motivate the realization of PMH, correspond to the number used in the figure 1.
a specific process model for health care context, that will be
presented in the section 3.
3. DATA MINING GUIDE FROM THE HEALTH
CARE POINT OF VIEW
This section presents a brief description about health care
context in subsection 3.1, with the purpose to highlight the
opportunity to provide new guides associated to data mining
applications to improve different kind of projects in health
care. Subsection 3.2 describes a new process model to guide
these projects.
3.1 Health care data management
The health care sector evidences challenges w.r.t. data man-
agement because of data characteristics such as volume, qual-
ity, availability, accessibility, and the relevance of the deci-
sions involved during the process analysis.
Moreover, there are clinical guidelines that describe a set of
steps to treat a specific disease. From these guides, it is pos-
Figure 1: Data mining methodology for health care
sible to determine the service efficiency, the time involved in
sector
the treatment, and typical practices such as treatments and
medications. This kind of information provides important
elements to the decision’s maker. I. Scope Definition of the exercise. This step allows to
define the business problem to be analysed. Several funda-
According to the volume of data analyzed, it is important to mental aspects must be clarified in this step: what, how, and
highlight that all clinical cases are relevant during process why the assessment is done, as well as defining the criteria
analysis. This is contrary to other sectors. In other domains for the success of the exercise. Some of the typical ques-
a rule is meaningful when the support of the data is relatively tions proposed by experts are focused on problems to vali-
high, whereas in the health care domain, analysis involving date the effectiveness of pharmacological treatments in the
mortalities will be accounted for although the number of emergency room according to IPS’s best practices, others in
cases will not be significant, statistically speaking. the control and monitoring of chronic diseases according to
the standards. In this step, these questions will be contex-
The ideas mentioned before motivates the development of tualized according to the service(hospitalizations, urgencies,
the process model for healthcare (PMH), which is described procedures or medical appointments), and the diagnosis to
in the next sections. be monitoring.
II. Selection of the reference guide. This step II con- and availability. These issues will be tacking into account to
sists on the selection of the reference guide(s) to evaluate the decide the use of these sources during the analysis process.
question proposed in the first step; for example the medi-
cal treatment used for a diagnosis. Currently, it is possible In the healthcare sector, there are different sources that can
to use the expert advise to validate the quality of service be obtained and used for the development of data mining
in health care. On the other hand, there is specific manu- projects.
als such as standards, protocols, and clinical guidelines pro-
posed by governments and organizations that can be used Generally, countries have an individual healthcare register
as a reference guide. corresponding to every hospitalization, urgency or proce-
dure associated to a patient. For example, the United States
The standards are evidence based on references used to eval- has the Electronic Health Record (EHR) which includes de-
uate the quality and performance of services, while protocols mographics, medical history, medication and allergies, im-
are documents that describe the rules of action depending munization status and observations (among others). On
on a specific circumstance [4]. Usually, protocols are spe- the other hand, Colombia has the Individuals Registers of
cific documentation defined by each IPS. Also, the clinical Health Care (RIPS in Spanish), that provides information
guidelines are systematically developed statements to assist related to the delivery of health services and demographic
practitioners and patient decisions about appropriate health variables.
care for specific clinical circumstances [12].
There are other kind of information sources related to na-
About clinical guidelines, the World Health Organization tional behavior such as national survey or naming standards.
(WHO) presents different guides related to the diagnosis, Some national surveys contains demographic and health in-
treatment and prevention of specific diseases such as obesity, formation that can be used to support the data mining pro-
malnutrition or diabetes. Furthermore, different countries cess. Furthermore, the WHO defines the CIE10 standard.
have established national standards to treat a disease. For This standard defines the classification and organization of
example, Colombia has the 412 resolution which suggests diseases based on a unique code that represents the category
the set of activities and procedures that should be used in and the specific affection.
public health diseases [13].
Some naming standards are associated to specific health in-
Although clinical guidelines vary in content, they have es- terventions available for each country. For example, the Aus-
sentially the following structure: tralian Classification of Health Interventions (ACHI) con-
tains all the procedures that are realized by HMOs in the
Clinical guideline structure country. The Unique Procedures Classification in healthcare
0. Authors (CUPS in Spanish) is the Colombian classification system for
1. Introduction this information.
2. Disease detection
3. Diagnosis IV. Selection and preparation of healthcare informa-
4. Classification and Tracking tion. It is necessary to have the support of health experts
5. Disease evaluation to select the information that is highly relevant to solve the
6. Non-pharmacological treatment proposed problem in step I. Each disease presents different
7. Pharmacological treatment characteristics. For example, there are diseases like prostate
cancer or pre-eclampsia in which sex is not a determining
8. Disease complications
factor. The first affects men, and the latter applies only
9. Disease special situations
for pregnant women. In chronic diseases like hypertension,
10. Hospital treatment
time is a significant variable. Its treatment is based on mon-
11. Emergency treatment itoring the patient periodically with formulated procedures
12. Clinical guidelines future review recommendations and medicines according to a specific order and to patient’s
13. Bibliography evolution over time. On contrary, appendicitis treatment is
considered relatively short. In this case the time variable
In the structure above, the interest lies (mainly) in points 4 is not relevant. These considerations must be taken into
to 11. The quality control proposed is based on the compar- account in selection step.
ison between treatments with a specific admission diagnosis
and an established clinical guide diagnosis. The selected information follow a data cleaning process.
Generally, health information has problems related to data
The suggestion is to choose the reference guide that has been management such as replication of records. Moreover, medicines
established in the national policies or regulations, because data management proposed a new challenge to data mining
international clinical guidelines may not have validity in a experts. Usually, this data does not have a standard to be
specific country, or may not be applied in certain IPS be- filled. For example, the medicine ”amoxycillin” can be filled
cause of socio-economic or epidemiological factors. as ”amox” or ”amoxycilin”. In these cases, the similarity
word analysis can be used to solve the issue.
III. Identification of information sources. This step
consists of the identification of useful information sources The corresponding discretization of continuous data and the
according to the scope of the project and the selected refer- standardization of information must be made, necessary pro-
ence guides. This identification depends on the data quality cedures for the execution of mining algorithms, which should
be in line with own business rules of the selected diagnosis. specific characteristics can be found. Moreover, In the con-
For example, in the case of Alzheimer’s disease, age cat- text of quality control, the association rules allow us to dis-
egories should be created after 40 years, being consistent cover rules that may or may not correspond to established
with the characteristics of vulnerable populations, and the treatments in clinical guidelines It is recommendable to use
evolution of the disease over time. Different from Appendici- this technique in the diagnosis of acute illness. This kind
tis disease, where age ranges should be used much broader, of illness usually are applied during a patient’s admission to
since it is a disease that can occur at any age. The com- the IPS and does not require periodic monitoring to ensure
plexity of both the discretization and the standardization new procedures or drugs depending on the patient progress.
of data may depend largely on the amount of selected in- Furthermore, association rules could be used in problems
formation sources and the absence of the use of ontologies that analyze the treatment’s behaviour.
for the unification and standardization of medical terms and
concepts. The possible way to do in quality control in the health care
it is as follows:
An statistical analysis of this step is necessary for physicians
because it’s important to know the data percentage that Two data sets are created, one for drug treatment informa-
must be cleaned and the problems that arise the datasets. tion and one for non-drug information for each patient with
the same admission diagnosis, and with the same method of
V. Information adjustment and preliminary analy- admission. It is important that the information of the ad-
sis. In this step is important to analyse the resulting dataset mission method is not mixed, since the in each treatments
of the previous phase. This analysis concerns to identify can be different. Thus, given an X diagnosis in hospitaliza-
the main characteristics of the data and the discovering of tion, we have the following data set of medications:
new variables that are relevant for an specific situation. In
healthcare, variables such as hospitalization window, total
cost of treatment and patient satisfaction are relevant in
many situations. For example, to solve questions like, what
is the most expensive treatment?, which is the one with the
lowest satisfaction?, which one represents a lower rate of
days of staying? these variables are relevant in this context.
The preliminary analysis is based on descriptive statistics.
The objective is to review the statistical data distribution in
order to avoid biased results. In this step can be identified
how many men or women are involved in the dataset, how is
Figure 2: medication dataset
the age distribution or if the treatment is based significantly
on drugs rather than procedures. The health experts evalu-
For this example, an association rule would be: ”In the phar-
ate the results and if necessary, this step is review again.
macological treatment of a patient with an X diagnostic, if
medicines M30 and M90 are provided, then M70 medicine
VI. Selection and implementation of data mining al-
will also be provided.”
gorithms. This step consists in the identification of data
mining algorithms to achieve the objectives proposed. In
This technique required some parameters to be customized:
healthcare, there are probable classifications of the mining
The support and the confidence. The first one is the pro-
problems. The first, is related to the analysis of treatments,
portion of transactions in the data set which contain the
which is feasible to predict the organism response against
itemset:
specific procedures. Next, the monitoring and evolution
of patients in infectious and chronic disease. The latter,
supp(A) = occurrence(A) ÷ size(dataset).
is based on verify the proper provision of health services.
For example, for the last classification there situations in
The second one is defined as the conditional probability (
which is appropriate to know the percentage of compliance
P (B|A)): the occurrence of A, given the occurrence of B:
of treatments with respect to a clinical guidelines or which
are the implications of meeting/failing the clinical guide- S
conf (A ⇒ B) = supp(A B) ÷ supp(A).
lines in terms of costs. The data mining algorithms that
are proposed are association rules, sequential patterns and
Sequential Patterns: This technique detects cause-effect
clustering techniques to solve specific problems in healthcare
relations considering the time periods in which transactions
sector.
occurred. In health care context, there are many situations
that involves periodic controls and patient’s monitoring. As
Association Rules: this technique identifies the cause- ef-
mentioned before, standards, clinical guidelines and proto-
fect relations between variables. It is possible to characterize
cols specify sequences of treatments that should be applied
an specific group using clustering techniques, and determine
in a explicit situation. For this reason, it is feasible to com-
the behaviour of an specific cluster using association rules.
pare the patient’s procedures versus reference guides using
sequential patterns. Chronic diseases like hypertension or
In healthcare, it’s interesting to discover relations among
diabetes requires that patients return to the IPS several
events. For example, if a patient’s disease evolves to a
times with the same diagnosis. This kind of problems in-
chronic phase, the probability to have a decease based on
volves many variables in each time period. The recommen-
dation is to use sequential patterns to analyze the evolution
of patient’s health based on the treatments applied.
Based on the concept that, at any given time, a clinical event
is the formulation of one or more drugs or procedures we
propose the creation of two data sets, drugs and procedures
datasets. In this case the records should be grouped by
patient, ordering clinical events in ascending order by date
and time.
Figure 5: dataset for clustering technique
does not necessarily implies that should be discarded. It all
depends on the clinical context that is being evaluated and
the criteria of the medical expert.
VII. Result Validation and Impact Analysis. This
final steps concerns to tunning up the data mining model
Figure 3: structure of a sequential pattern dataset based on the health expert feedback and the results achieved.
A detailed study of the results is made by a board of medical
All clinical events from a patient arranged in order, can be experts in the area supported by a technical group of people
seen together as a sequence, where each event corresponds to to determine whether or not to repeat some of the above
a set of drugs or procedures. Figure 4 shows the sequences steps, possibly with a change in strategy or range, or by a
found in previous patients. refinement of the data used, to get specific conclusions.
4. VALIDATION
This proposal is validated based on the PMH process and
the product obtained by applying this steps in the solution
of a real problem in Colombia’s health care sector.
Our proposal is a specialization of the CRISP-DM method-
Figure 4: sequence of patients procedures ology. The CRISP-DM process has been validated in several
domains and modified during more than a decade, based on
Clustering: to found groups of elements with similar char- the application of the process in many data mining projects.
acteristics. In healthcare, is very common to analyze pop- In that sense, we refined the CRISP phases in order to im-
ulations based on specific characteristics. Nevertheless, it’s prove the knowledge and special issues of the health care
possible to use this technique in the validation of right treat- sector, reducing the time and resources that have to be used
ments to right people. The reference guides define specific for understand this particular domain.
treatments for people with Specific demographic character-
istics. In this case, the use of clustering determines subsets The PMH process incorporates the diagnosis and reference
based on procedures and demographic information. guide selection in the business understanding phase of CRISP-
DM, and presents the selection criteria for these steps. Fur-
In addition, there are situations in which data quality is thermore, in the other phases is possible to understand the
poor. For this reason, its impossible to analyze particular principal problems associated with data quality in health
issues in the dataset because of the confidence of data. The care, and to determine which algorithms should be used to
suggestion is to use clustering techniques to generalize the solve several problems in this sector.
main characteristics of an specific group. To perform these
analysis, a data set must be created which includes patient The next sections are focused on the product validation. It
information such as gender, age, marital status and race, applies the PMH process in the quality control of Colombia’s
(among others), as well as drugs and/or procedures pro- health care sector and validate the results based on the ex-
vided, grouped into treatments found in the previous sec- perts criteria. The development of this exercise consists in
tion, as shown in figure 5: two iterations of the process. The specifics steps of PMH
are described below.
As in previous algorithms, it is important to analyze the
number of people supporting each cluster, before making 4.1 Problem description
any conclusions. It is also essential to understand, that a Hypertension is a chronic disease that affects 20% of people
cluster represents a very small percentage of the population in the world. This is considered the first cause of morbility
and the most representative disease related to cardiovascular plemented using the IBM Intelligent Miner 8.1 data mining
affections. For this reason, the objective of this exercise was tool.
to evaluate the pharmacological and non-pharmacological
treatment for this disease in Colombia. The figure 6 shows the results of the mining model.
4.2 First iteration
Step I consists on the description of the problem according to
the context, related to the diagnosis and type of disease that
is going to be tackled. In this case, it is relevant to analyze
the characteristics of hypertension. This is a chronic disease,
generally asymptomatic and it requires continuous medical
assistance. The typical complications of hypertension are
related to cardiac failures. This complications implies hos-
pitalizations, urgencies and complex procedures. Figure 6: Sequential patterns results
Based on the health experts support, there are different ref- The main result is the use of captopril. More than 40% of
erence guides related to hypertension, but its treatment may the patients were medicated with this medicine. Based on
differ from one country to another. For this reason, the se- health experts opinions, the captopril is a medicine used in
lection of the reference guide was focused on the ”Clinical mono-therapy treatments and is prescripted for economical
guideline for hypertension disease” proposed by the Associ- reasons. Other important conclusion is related to data qual-
ation of Faculties of Medicine of Colombia [3]. ity of medicines; a specific naming standarisation was used
to resolve the problem. In the same way, several manual
As mentioned in section III, Colombia has the Individual process was realised to mitigate replication of records prob-
Registers of Health Care. The objective of this data is re- lem.
lated to bill the delivery of health services made by the
Colombian’s IPS. The information was collected from HMOs 4.3 Second iteration
and the Minister of Social Protection from years 2003 to Based on the results of the first iteration, a second iteration
2006 (49,000,000 of individual register of health care). Fur- of the steps suggested in PMH was performed. Therefore,
thermore,the RIPS data are filled based on the CIE10 and the health experts proposed to analyze if the health system
CUPS standards. For this reason, these information is also may incur in higher costs because of the prescription of cap-
taken into account. topril to the patients. Using the RIPS data, we want to
determine the complications related to these patients.
According to the expert opinion, the relevant information
for this analysis is described in table 2: According to the PMH, it is necessary to prepare the data
that will be used by the model. For this reason, the data
presents demographic information and relevant aspects about
Variables the evolution of a patient’s disease.
Principal diagnosis
IPS Identification In this case, new variables have to be include based on the
Date health expert recommendation. This variables are associ-
Sex ated to the evolution to a chronic phase in the clinical his-
Department tory of patients. For this reason, were introduced the date
Type of medical service of the patient’s complication, the associated diagnosis, the
Medicine number of hospitalizations or procedures before and after the
Procedures complication of hypertension. The next step of the method-
Length of stay ology, proposes a preliminary analysis of the information. In
decease diagnosis this case, the information consists on all the records of the
related diagnosis patients that have suffered this disease.
To describe the complication of a patient, the sequential
The statistical analysis results for the dataset shows that clustering technique was used. As described in section 3,with
a 77% of the registers correspond to procedures, 17% are this technique it is possible to find clusters of patients with
medical appointments and a 6% of the data to medicines similar sequences and characteristics.
prescriptions. Moreover, a 67% correspond to men and a
33% to women. On the other hand, a 16% of the patients The results of the second iteration shows that patients with
return to the IPS for health controls. similar sequences associated to the use of captopril have
complications such as chronic cardiac failures, hypertensive
In the pharmacological treatment of hypertension, we want crisis or heart attacks.
to analyze the most relevant medication sequences. In this
case, the time variable is highly relevant because we want In general, it is important to analyze that the use of cap-
to trace the prescription of medicines for a specific disease. topril is pre-scripted for economical reasons. This strategy
For this reason, we used sequential patterns, according to is useful in a short term period, but in a long term, we can
the ideas presented in subsection 3.2. Our model was im- observe that patients with this kind of treatment, returns
to the healthcare institution with complications in the hy- [10] S. N. and T. A. M. The relevance of data warehousing
pertension disease. This kind of complications increase the and data mining in the field of evidence-based
illness costs. medicine to support healthcare decision making. 2002.
[11] L. Nevine and M. Malek. Data mining for cancer
5. CONCLUSIONS AND FUTURE WORK management in egypt case study: Childhood acute
This paper proposes a process model to guide the data min- lymphoblastic leukemia. 2005.
ing process in the health care sector. It suggests a set of [12] N. I. of Health. About clinical practice guidelines.
iterative and facultative steps to improve the results of the [13] M. of Social Protection. Resolución 412.
mining process. This process model was evaluated using the http://mps.minproteccionsocial.gov.co/pars/caja-
analysis of quality of service for the treatment of hyperten- herram/documentos/Biblioteca/CompendioNormativo/
sion in Colombia. The results shows that it is possible to resolucion 412 00.pdf, 2002.
establish new hypothesis about the datasets, and revalidate [14] A. S. S. Raza and A. S. Raza. A case for
this affirmations using the proposed process model. At the supplementing evidence base medicine with inductive
same way, these results evidence some facilities provided to clinical knowledge: Towards a technology-enriched
the data mining expert to guide their process, specially as- integrated clinical evidence system. In Proceedings of
sociated to the knowledge about healthcare context such as the Fourteenth IEEE Symposium on Computer-Based
data sources, reference guides and data mining techniques. Medical Systems, CBMS ’01, pages 5–, Washington,
DC, USA, 2001. IEEE Computer Society.
An exhaustive validation of the process model is considered [15] N. R. T. and P. Jian. Introduction to the special issue
as future work, in terms of a formal comparison between on data mining for health informatics. SIGKDD
the use of CRISP-DM and PMH. At the same time, new Explor. Newsl., 9:1–2, June 2007.
kind of question from the expert point of view will be in- [16] van Driel M. A., C. K., K. P. P., L. J. A., and B. H.
teresting to resolve using this process model. In particular, G. A new web-based data mining tool for the
the identification of an epidemiological profile for Colombian identification of candidate genes for human genetic
population. disorders. Eur J Hum Genet, 11(1):57–63+, 2003.
[17] C. Yu, P. L. Henning, C. W. W., and O. Jorn. Drug
6. ACKNOWLEDGMENT exposure side effects from mining pregnancy data.
The authors appreciate the support of Jose Abasolo, profes- SIGKDD Explor. Newsl., 9:22–29, June 2007.
sor at Los Andes University, which provides the initial ideas
and suggestions for the development of this article.
7. REFERENCES
[1] E. A.M. and H. Fu. Privacy preserving distributed
learning clustering of healthcare data using
cryptography protocols. In Computer Software and
Applications Conference Workshops (COMPSACW),
2010 IEEE 34th Annual, pages 140 –145, july 2010.
[2] P. Chapman, J. Clinton, R. Kerber, T. Khabaza,
T. Reinartz, C. Shearer, and R. Wirth. Crisp-dm 1.0
step-by-step data mining guide. Technical report, The
CRISP-DM consortium, August 2000.
[3] A. colombiana de Facultades de Medicina. Guı́a clı́nica
para la hipertensión arterial.
http://www.redsalud.gov.cl/archivos/guiasges/
hipertension arterial primaria.pdf.
[4] eHow Health. Definition of clinical protocol.
[5] X. Fang. Are you becoming a diabetic? a data mining
approach. In Fuzzy Systems and Knowledge Discovery,
2009. FSKD ’09. Sixth International Conference on,
volume 5, pages 18 –22, august 2009.
[6] K. Harleen and S. K. Wasan. Empirical study on
application of data mining techniques in health care.
Journal of computer science 2, pages 194–200, 2006.
[7] A. Li, S. Wang, H. Zheng, L. Ji, and J. Wu. A novel
abnormal ecg beats detection method. In Computer
and Automation Engineering (ICCAE), 2010 The 2nd
International Conference on, volume 1, pages 47 –51,
february 2010.
[8] B. M.J.A. and L. G.S. Mastering data mining. 2000.
[9] L. N. Predicting the risk of future hospitalization. In
Database and Expert Systems Applications (DEXA),
2010 Workshop on, pages 120 –124, september 2010.