<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multidimensional Mining over Big Healthcare Data: A Big Data Analytics Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mario Bochicchio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alfredo Cuzzocrea</string-name>
          <email>alfredo.cuzzocrea@dia.units.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucia Vaira</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonella Longo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Zappatore</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Salento</institution>
          ,
          <addr-line>Lecce</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Trieste and ICAR-CNR</institution>
          ,
          <addr-line>Trieste</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>24</fpage>
      <lpage>27</lpage>
      <abstract>
        <p>Nowadays, a great deal of attention is being devoted to big data analytics in complex healthcare environments. Fetal growth curves, which are a classic case of big healthcare data, are used in prenatal medicine to early detect potential fetal growth problems, estimate the perinatal outcome and promptly treat possible complications. However, the currently adopted curves and the related diagnostic techniques have been criticized because of their poor precision. New techniques, based on the idea of customized growth curves, have been proposed in literature. In this perspective, the problem of building customized or personalized fetal growth curves by means of big data techniques is discussed in this paper. The proposed framework introduces the idea of summarizing the massive amounts of (input) big data via multidimensional views on top of which wellknown Data Mining methods like clustering and classification are applied. This overall defines a multidimensional mining approach, targeted to complex healthcare environments. A preliminary analysis on the effectiveness of the framework is also proposed.</p>
      </abstract>
      <kwd-group>
        <kwd>Mining Big Data</kwd>
        <kwd>Big Healthcare Data</kwd>
        <kwd>Healthcare Systems</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Big data analytics in complex healthcare environments (e.g., [
        <xref ref-type="bibr" rid="ref13 ref14 ref15 ref16 ref17">13,14,15,16,17</xref>
        ]) are of
high interest at now, by following well-known principles of big data management and
mining (e.g., [
        <xref ref-type="bibr" rid="ref18 ref19 ref20">18,19,20</xref>
        ]). Here, the main problem consists in devising models,
techniques and algorithms focused to extract useful knowledge from enormous amounts of
(big) data, with the goal of implementing so-called big data intelligence, i.e. deriving
decisions, decision processes, guidelines and policies devoted to improve the target
healthcare system (e.g., [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]). Fetal growth curves, which are a classic case of big
healthcare data, are very important in prenatal medicine for fetal well-being evaluation.
Indeed, they represent a mature and well-established practice to early detect potential
fetal growth restriction, to estimate the perinatal outcome and promptly treat possible
complications. The general idea underlying this test is very simple and effective: fetuses
grow up showing a regular trend as a function of the gestational age. Therefore, their
wellbeing can be assessed by tracking their sizes over the time and by comparing them
with a reference growth curve known as “good”. The implementation of the idea, based
on ultrasounds pictures of the maternal abdomen, is quite simple, non-invasive and
inexpensive.
      </p>
      <p>
        In the clinical routine, fetal biometric parameters coming from this test are compared
with a set of reference parameters, which are usually provided by the same test
equipment. When results are too large or too small for the gestational age, they are classified
as “potentially pathologic” and supplementary clinical tests are required/performed. A
very problematic aspect in this practice is that several sets of fetal growth curves are
reported in literature and the adoption of the right one is crucial to avoid errors (e.g., to
avoid wrong classifications of fetuses as pathologic or non-pathologic) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This is a
hot topic for the obstetrics and gynecologist community [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], since the currently-adopted
references lack of several mother-related aspects, such as ethnic group, food, drugs and
smoke. Indeed, it has been recognized that these and other factors have a non-negligible
influence on the actual growing trends of fetuses and, then, on the overall number of
false-positives/negatives, and further unnecessary tests. In the current practice, failure
rates as high as 46% are reported in literature [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], even considering the standard defined
by the World Health Organization (WHO), so that, in several cases, it is hard to decide
whether the fetus has to be considered pathologic or not. For this reason, customized
fetal growth charts [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] have been proposed as an alternative to “literature-based”
growth curves. The increasing acceptance of this best practice suggests for a new and
ambitious perspective: the creation of an online service able to collect and analyze the
world production of fetal growth data [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], in order to support obstetricians in the
production of customized/personalized fetal growth curves. The clinical understanding of
the phenomenon described by such a large amount of data is extremely intriguing,
because of the underlying idea of finally grabbing a total understanding of the fetal growth
processes. On the other hand, it is also challenging, due to both technical and medical
reasons. These aspects are discussed in the remaining part of the paper. The main goal
of the paper is that of assessing the feasibility of such online service. To this end, the
paper proposes a big data analytics framework for building customized or personalized
fetal growth curves by means via innovative big data techniques. The proposed
framework introduces the idea of summarizing the massive amounts of (input) big data via
multidimensional views [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] on top of which well-known Data Mining methods like
clustering and classification are applied. This overall defines a multidimensional
mining approach, targeted to complex healthcare environments. A preliminary analysis on
the effectiveness of the framework is also proposed.
2
      </p>
      <p>
        Building Customized Fetal Growth Curves by means of Big
Data Analysis Techniques
In this Section, we provide the main contribution of our research, i.e. principles and
definitions of a big data analytics framework for supporting multidimensional mining
in complex healthcare environments. The idea of developing an online service to collect
and analyze large datasets about maternal/fetal wellbeing and fetal growth, and
supporting gynecologists and obstetricians in diagnoses of fetal growth restrictions has
been considered valuable by several authors [
        <xref ref-type="bibr" rid="ref1 ref3">1,3</xref>
        ]. Actually, scaling this approach up
to the worldwide production of fetal-maternal data could drive the medical community
toward a deeper understanding of fetal pathologies, but several aspects have to be
considered.
      </p>
      <p>From a technical point of view, due to the volume of data to collect and analyze, the
variety of descriptors associated to each mother/fetus and the velocity inherent to the
phenomenon, Big Data techniques must be adopted. Indeed, every year there are about
160 millions of newborns in the world: on average this is equivalent to about 300
newborns per second. Considering that, according to international guidelines, for each
healthy woman, about 10 clinical test (3 for growth tracking) are performed during
pregnancy, and that for pathologic fetuses this number is significantly higher, the global
production of data on the phenomenon can be estimated in a continuous stream of 1,000
– 10,000 new medical records per second, with an overall volume of 1 to 10 Petabytes
per year. For the purposes of this paper, a more realistic scenario including 10% to 20%
of fetal-growth data from at least two European countries is sufficient to test the
proposed framework and tune it for further extension.</p>
      <p>According to current methods adopted in the clinical practice for fetal-maternal
wellbeing assessment, the main algorithms to analyze this stream would be based on: (i)
least-square method, (ii) multidimensional analysis, (iii) clustering and classification
techniques. The overall computational load is hard to estimate because of the problem
is still under investigation, but distributed approaches and parallelization techniques
are likely to be adopted. Moreover, the variety of data types (both structured and
unstructured) is one of the main characteristics of this research field, because of the
elements affecting fetal growth are not completely known and, every year, new variables
come from the influence of new pathologies, medicines, therapies, pollutants etc. This
heterogeneity is problematic to manage, but it is an important and unavoidable
characteristic of the problem.</p>
      <p>Referring to the algorithmic part, the possibility of constructing dynamic and
customized fetal growth curves is mainly based on the following aspects:
• the application of multidimensional analysis techniques, which allow to
both summarize the massive amounts of (input) big data via
multidimensional and identify groups of patients (fetuses, in our case) who share
similar growth patterns over the time;
• the possibility of searching for possible correlation of fetal growths vs
parameters like ethnic group, maternal age, fetal gender, and so on.</p>
      <p>The hypothesis is that fetuses at the same gestational age, with similar genetic
makeup (e.g., ethnicity, familial aspects, and so forth) and in similar environmental
conditions (e.g., food, smoke, drugs, and so forth), are subject to similar growth curves. This
kind of fetuses will be referred, in the following of the paper as Homogeneous Patient
Groups (HPG). This allows to identify patients who share common profiles in order to
determine if a given fetus is potentially pathologic or not, when his/her growth
parameters are different from those of the HPG to which he/she belongs. The membership to
a specific group is established at run-time and a specific fetus can belong to one or more
groups simultaneously, according to the analyzed features.</p>
      <p>The diagnostic process is based on the following three main steps:
a) build the summarizing multidimensional view of the target experiment;
b) initially, the HPG of each mother/fetus is not known; it can be identified
through anamnesis or specific tests and exams;
c) the wellbeing of each new fetus is assessed by comparing its actual sizes
with the reference charts of the HPG identified by the previous step.</p>
      <p>
        In terms of multidimensional analysis, patients can be represented as
multidimensional points, and HPGs as regions, of a multidimensional space whose dimensions are
all parameters affecting the fetal growth (ethnicity, maternal weight, height, familial
aspects, foods, and so forth), and whose measures of analysis are the biometric
parameters of the fetus. In this sense, step b) corresponds to identifying the patient’s nearest
HPG according to some distance measure, while step c) requires to compute the average
size, the variance, and the corresponding percentile on a purposely defined
(sufficiently-wide and updated) subset of elements of the same HPG. Moreover, HPGs can
be periodically updated (e.g., every four to six months, considering that growth
variations are not expected to emerge on shorter periods) by means of a suitable clustering
algorithm. Considering that clustering algorithms are computationally intensive and
cannot be repeated at the arrival of every new biometric fetal measure, a suitable
classification algorithm must be exploited in order to decide to which precomputed clusters
the new sample belong [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>The innovative aspects of this method with respect to other approaches known in
literature can be summarized as follows:
- HPGs are periodically updated rather than statically defined by means of
standardized growth curves;
- reference growth curves are associated to each HPG (hundreds or
thousands) rather than on few ethnic groups;
- fetal growth curves are continuously updated with the data coming from
patients under examination, thus also revealing the long-term population
trends in patient’s growth.
3</p>
      <p>Implementation, Preliminary Results and Discussion
We conducted some preliminary analysis to proof the effectiveness of our proposed
framework, on the basis of real-life big healthcare data coming from a
government/university research project. This Section describes the outcome of this task of our research.</p>
      <p>In order to explore the implementation details of the proposed big data analytics
framework for building customized fetal growth curves, we decided to collect and
analyze an actual sample of fetal-maternal data coming from two facilities locate in the
Apulia region in south Italy, namely: a university clinic, also involved in research about
malformation and diabetes in pregnancy, and a general hospital. The facilities serve a
basin of about 1.5 million citizens and assists more than 10,000 pregnant women per
year. The sample, concerning about 500 pregnant women under assistance by 8 medical
doctors, consists of a quite sparse table with about 2,500 records and 60 attributes
grouped into 9 main categories (having obvious meaning):
• Personal Data;
• Parity;
• Fetal Biometry;
• Diabetic Profile;
• Maternal Biometry;
• Familiarity;
• Glycemic Profile;
• Other Pathologies;
• Delivery Outcome.</p>
      <p>100
80
60
40
20
0</p>
      <p>For each category, the percentage of information completeness, defined as the
number of not-null records over the total number of records, is represented in Fig. 1.</p>
      <p>
        This collection permitted us to better understand the nature and the variability of data
involved in maternal and fetal wellbeing monitoring. Moreover, it permitted us to
define a set of Dimensional Fact Models (DFM) [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] able to describe a typical
fetal-maternal test, along with its variable aspects. A simplified version of the main DFM is
shown in Fig. 2.
      </p>
      <p>According to the guidelines of the WHO on fetal growth curves, the available data
sample has been also used to extract the reference curves for the target population to be
analyzed. The process included a preliminary normality distribution test and a linear
regression. This preliminary step was essential for a quantitative evaluation of the
reduction of false positive/negative obtained with the new proposed method.</p>
      <p>Patient</p>
      <p>Pregnancy profile
Completed Preterm Abortions Born alive</p>
      <p>Growth Test
Biparietal Diameter
Head Circumference
Abdominal Circumference
Femur Length
Weigth gain
Regular insulin
GlycemUic pltrorfialesato1u0nd
Glycemic profileat 15
Glycemic profileat 22
Glycemic profileat 14
HbA1c
Proteinuria</p>
      <p>Gestational Weeks
LastMenstrual Period</p>
      <p>Days Weeks Trimesters</p>
      <p>Day Month Year</p>
      <p>C
I
B
4 5 6
Number of components</p>
      <p>Classification
||||||||||||||||||||||||||||||||||| |
1 | |
| | |</p>
      <p>| | | |
| |</p>
      <p>| | | ||||||||||||||||||||||||||||||||||| || | | |</p>
      <p>
        For what concern the clustering analysis, the above-defined multidimensional model
has been implemented on top of the OLAP server Mondrian [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and two standard
implementations of the density-based and EM clustering techniques have been provided by
the R environment [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. In our preliminary experimentation, we noticed that, while the
EM algorithm converges on two overlapped clusters, no clear results come out from the
density-based algorithm, probably due to the very non-homogeneous nature of the
processed dataset. The results achieved by applying the EM algorithm are represented in
Fig. 3 - up (Bayesian Information Criterion, used to estimate the number of clusters in
the analyzed sample) and Fig. 3 - down (the original dataset and the two achieved
clusters). Indeed, this approach can be improved by overcoming the well-known not-exciting
performance of density-based clustering algorithms (e.g., [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]), for instance by adopting
a kind of adaptive threshold like in [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ].
      </p>
      <p>Fig. 4. EM Clustering: Execution Time vs. Sample Size.</p>
      <p>Finally, in Fig. 4 it is reported the execution time of the adopted algorithm as a
function of the problem size. This parameter is important to decide the maximum size of the
clustered sample as well as how frequently it can be updated. The result shows that the
execution time increases more than exponentially and that datasets of 5120 fetal sizes
can be processed in about 4 minutes on a Pentium Core i5 @ 2.5 GHz, which is
compatible with the discussed problem.</p>
      <p>The application of these methodologies (i.e., multidimensional summarization and
clustering analysis) confirms to us the effectiveness of our proposed framework in
dealing with multidimensional mining in complex healthcare environments via big data
analytics techniques.
4</p>
      <p>Conclusions and Further Work
In this paper, we have introduced a big data analytics framework targeted to big
healthcare data. The framework realizes a multidimensional mining approach for
building customized or personalized fetal growth curves. The main idea consists in
summarizing the massive amounts of (input) big data via multidimensional views on top of
which well-known Data Mining methods like clustering and classification are applied.
A preliminary analysis on the effectiveness of the framework has been also proposed.</p>
      <p>
        Future work is mainly oriented towards extending our big data analytics framework
by means of innovative computing metaphors such as adaptiveness (e.g., [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]) and
uncertainty (e.g., [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]).
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gardosi</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyan</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahota</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Symonds</surname>
            <given-names>E.M.</given-names>
          </string-name>
          , “
          <article-title>Customised antenatal growth charts”</article-title>
          ,
          <source>Lancet</source>
          <volume>339</volume>
          (
          <issue>8788</issue>
          ), pp.
          <fpage>283</fpage>
          -
          <lpage>287</lpage>
          ,
          <year>1992</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bochicchio</surname>
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Longo</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vaira</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malvasi</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tinelli</surname>
            <given-names>A.</given-names>
          </string-name>
          , “
          <article-title>Creating dynamic and customized fetal growth curves using cloud computing”</article-title>
          ,
          <source>in: Proceedings of BIBE</source>
          <year>2013</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          ,
          <fpage>2013</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Tinelli</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <string-name>
            <surname>Bochicchio</surname>
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vaira</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malvasi</surname>
            <given-names>A.</given-names>
          </string-name>
          , “
          <article-title>Ultrasonographic Fetal Growth Charts: An Informatic Approach by Quantitative Analysis of the Impact of Ethnicity on Diagnoses Based on a Preliminary Report on Salentinian Population”</article-title>
          ,
          <source>BioMed Research International</source>
          <year>2014</year>
          (
          <volume>1</volume>
          ),
          <source>Article ID 386124</source>
          ,
          <year>2014</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Giorlandino</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Padula</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cignini</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mastrandrea</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vigna</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buscicchio</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giorlandino</surname>
            <given-names>C.</given-names>
          </string-name>
          , “
          <article-title>Reference interval for fetal biometry in Italian population”</article-title>
          ,
          <source>Journal of Prenatal Medicine</source>
          <volume>3</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>62</fpage>
          -
          <lpage>68</lpage>
          ,
          <year>2009</year>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Johnsen</surname>
            <given-names>S.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilsgaard</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rasmussen</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sollien</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiserud</surname>
            <given-names>T.</given-names>
          </string-name>
          , “
          <article-title>Longitudinal reference charts for growth of the fetal head, abdomen</article-title>
          and femur”,
          <source>Eur J Obstet Gynecol Reprod Biol</source>
          <volume>127</volume>
          (
          <issue>2</issue>
          ), pp.
          <fpage>172</fpage>
          -
          <lpage>185</lpage>
          ,
          <year>2006</year>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Knorr-Held</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Best</surname>
            <given-names>N. G.</given-names>
          </string-name>
          ,
          <article-title>“A shared component model for detecting joint and selective clustering of two diseases”</article-title>
          ,
          <source>Journal of the Royal Statistical Society</source>
          <volume>164</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>73</fpage>
          -
          <lpage>85</lpage>
          ,
          <year>2001</year>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>McLachlan</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peel</surname>
            <given-names>D.</given-names>
          </string-name>
          , “Finite Mixture Models”, John Wiley &amp; Sons, New York, USA,
          <year>2000</year>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>McLachlan G.J.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Krishnan</surname>
            <given-names>T.</given-names>
          </string-name>
          , “
          <article-title>The EM Algorithm and Extensions”</article-title>
          , John Wiley &amp; Sons, New York, USA, second edition,
          <year>2008</year>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Bruno</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cerquitelli</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiusano</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiao</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <article-title>“A Clustering-Based Approach to Analyse Examinations for Diabetic Patients”</article-title>
          ,
          <source>in: Proceedings of ICHI</source>
          <year>2014</year>
          , pp
          <fpage>45</fpage>
          -
          <lpage>50</lpage>
          ,
          <year>2014</year>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pang-Ning</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steinbach</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            <given-names>V.</given-names>
          </string-name>
          , “Introduction to Data Mining”,
          <string-name>
            <surname>Addison-Wesley</surname>
          </string-name>
          , Boston, USA,
          <year>2006</year>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Cerquitelli</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiusano</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiao</surname>
            <given-names>X.</given-names>
          </string-name>
          , “
          <article-title>Exploiting clustering algorithms in a multiple-level fashion: A comparative study in the medical care scenario”</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>55</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>297</fpage>
          -
          <lpage>312</lpage>
          ,
          <year>2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Nithya</surname>
            <given-names>N.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duraiswamy</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomathy</surname>
            <given-names>P.</given-names>
          </string-name>
          , “
          <article-title>A Survey on Clustering Techniques in Medical Diagnosis”</article-title>
          ,
          <source>International Journal of Computer Science Trends and Technology</source>
          <volume>1</volume>
          (
          <issue>2</issue>
          ), pp.
          <fpage>17</fpage>
          -
          <lpage>22</lpage>
          ,
          <year>2013</year>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sakr</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elgammal</surname>
            <given-names>A.</given-names>
          </string-name>
          , “
          <article-title>Towards a Comprehensive Data Analytics Framework for Smart Healthcare Services”</article-title>
          ,
          <source>Big Data Research</source>
          <volume>4</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>44</fpage>
          -
          <lpage>58</lpage>
          ,
          <year>2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lee</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hyun</surname>
            <given-names>S.J.</given-names>
          </string-name>
          , “
          <article-title>A data acquisition architecture for healthcare services in mobile sensor networks”</article-title>
          ,
          <source>in: Proceedings of BigComp</source>
          <year>2016</year>
          , pp.
          <fpage>439</fpage>
          -
          <lpage>442</lpage>
          ,
          <year>2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Barkhordari</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niamanesh</surname>
            <given-names>M.</given-names>
          </string-name>
          , “
          <article-title>ScaDiPaSi: An Effective Scalable and Distributable MapReduceBased Method to Find Patient Similarity on Huge Healthcare Networks”</article-title>
          ,
          <source>Big Data Research</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>9</fpage>
          -
          <lpage>27</lpage>
          ,
          <year>2015</year>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Mezghani</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Exposito</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drira</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Da Silveira</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pruski</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <article-title>“A Semantic Big Data Platform for Integrating Heterogeneous Wearable Data in Healthcare”</article-title>
          ,
          <source>Journal of Medical Systems</source>
          <volume>39</volume>
          (
          <issue>12</issue>
          ), p.
          <fpage>185</fpage>
          ,
          <year>2015</year>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Begoli</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dunning</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frasure</surname>
            <given-names>C.</given-names>
          </string-name>
          , “
          <article-title>Real-Time Discovery Services over Large, Heterogeneous and Complex Healthcare Datasets Using Schema-Less, Column-Oriented Methods”</article-title>
          ,
          <source>in: Proceedings of BigDataService</source>
          <year>2016</year>
          , pp.
          <fpage>257</fpage>
          -
          <lpage>264</lpage>
          ,
          <year>2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Cuzzocrea</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saccà</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullman</surname>
            <given-names>J.D.</given-names>
          </string-name>
          , “
          <article-title>Big data: a research agenda”</article-title>
          ,
          <source>in: Proceedings of IDEAS</source>
          <year>2013</year>
          , pp.
          <fpage>198</fpage>
          -
          <lpage>203</lpage>
          ,
          <year>2013</year>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Cuzzocrea</surname>
            <given-names>A.</given-names>
          </string-name>
          , “
          <article-title>Analytics over Big Data: Exploring the Convergence of Data Warehousing, OLAP and Data-Intensive Cloud Infrastructures”</article-title>
          ,
          <source>in: Proceedings of COMPSAC</source>
          <year>2013</year>
          , pp.
          <fpage>481</fpage>
          -
          <lpage>483</lpage>
          ,
          <year>2013</year>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Yu</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuzzocrea</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jeong</surname>
            <given-names>D.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maydebura</surname>
            <given-names>S.</given-names>
          </string-name>
          , “
          <article-title>On Managing Very Large Sensor-Network Data Using Bigtable”</article-title>
          ,
          <source>in: Proceedings of CCGRID</source>
          <year>2012</year>
          , pp.
          <fpage>918</fpage>
          -
          <lpage>922</lpage>
          ,
          <year>2012</year>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Gray</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaudhuri</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosworth</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Layman</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reichart</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venkatrao</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pellow</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pirahesh</surname>
            <given-names>H.</given-names>
          </string-name>
          , “Data Cube:
          <string-name>
            <given-names>A Relational</given-names>
            <surname>Aggregation Operator Generalizing</surname>
          </string-name>
          Group-by,
          <article-title>Cross-Tab, and Sub Totals”</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>29</fpage>
          -
          <lpage>53</lpage>
          ,
          <year>1997</year>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Bailey</surname>
            <given-names>T.L.</given-names>
          </string-name>
          , “
          <article-title>Fitting a mixture model by expectation maximization to discover motifs in biopolymers”</article-title>
          ,
          <source>in: Proceedings of ISMB</source>
          <year>1994</year>
          , pp.
          <fpage>28</fpage>
          -
          <lpage>36</lpage>
          ,
          <year>1994</year>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Golfarelli</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maio</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rizzi</surname>
            <given-names>S.</given-names>
          </string-name>
          , “
          <article-title>The Dimensional Fact Model: A Conceptual Model for Data Warehouses”</article-title>
          ,
          <source>International Journal of Cooperative Information Systems</source>
          <volume>7</volume>
          (
          <issue>2-3</issue>
          ), pp.
          <fpage>215</fpage>
          -
          <lpage>247</lpage>
          ,
          <year>1998</year>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Mondrian - Pentaho Community</surname>
          </string-name>
          , http://community.pentaho.com/projects/mondrian/,
          <year>2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <article-title>The R Project for Statistical Computing</article-title>
          , https://www.r-project.
          <source>org/</source>
          ,
          <year>2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Agrawal</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>El</surname>
            <given-names>Abbadi A.</given-names>
          </string-name>
          , “
          <article-title>Big data and cloud computing: current state and future opportunities”</article-title>
          ,
          <source>in: Proceedings of EDBT</source>
          <year>2011</year>
          , pp.
          <fpage>530</fpage>
          -
          <lpage>533</lpage>
          ,
          <year>2011</year>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Xia</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>L.T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinel</surname>
            <given-names>A.</given-names>
          </string-name>
          , “Internet of things”,
          <source>International Journal of Communication Systems</source>
          <volume>25</volume>
          (
          <issue>9</issue>
          ), p.
          <fpage>1101</fpage>
          ,
          <year>2012</year>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Dean</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghemawat</surname>
            <given-names>S.</given-names>
          </string-name>
          , “
          <article-title>MapReduce: simplified data processing on large clusters”</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>51</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>107</fpage>
          -
          <lpage>113</lpage>
          ,
          <year>2008</year>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Cannataro</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuzzocrea</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pugliese</surname>
            <given-names>A.</given-names>
          </string-name>
          , “
          <article-title>XAHM: an adaptive hypermedia model based on XML”</article-title>
          ,
          <source>in: Proceedings of SEKE</source>
          <year>2002</year>
          , pp.
          <fpage>627</fpage>
          -
          <lpage>634</lpage>
          ,
          <year>2002</year>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Cuzzocrea</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kai-Sang Leung</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kyle</surname>
          </string-name>
          MacKinnon R., “
          <article-title>Mining constrained frequent itemsets from distributed uncertain data”</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          <volume>37</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>117</fpage>
          -
          <lpage>126</lpage>
          ,
          <year>2014</year>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Aliguliyev</surname>
            ,
            <given-names>R.M.</given-names>
          </string-name>
          , “
          <article-title>Performance Evaluation of Density-based Clustering Methods”</article-title>
          ,
          <source>Information Sciences</source>
          <volume>179</volume>
          (
          <issue>20</issue>
          ), pp.
          <fpage>3583</fpage>
          -
          <lpage>3602</lpage>
          ,
          <year>2009</year>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Hassani</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spaus</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuzzocrea</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seidl</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>“</surname>
            <given-names>I-HASTREAM</given-names>
          </string-name>
          :
          <article-title>Density-Based Hierarchical Clustering of Big Data Streams and Its Application to Big Graph Analytics Tools”</article-title>
          ,
          <source>in: Proceedings of CCGrid</source>
          <year>2016</year>
          , pp.
          <fpage>656</fpage>
          -
          <lpage>665</lpage>
          ,
          <year>2016</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>