<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a Collaborative Platform for Advanced Meta-Learning in Health care Predictive Analytics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Milan Vukicevic</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sandro Radovanovic</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joaquin Vanschoren</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giulio Napolitano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Boris Delibasic</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bonn University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Eindhoven University of Technology, Department of Mathematics and Computer Science</institution>
          ,
          <addr-line>Eindhoven</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Belgrade, Faculty of Organizational Sciences</institution>
          ,
          <addr-line>Jove Ilica 154, Belgrade</addr-line>
          ,
          <country country="RS">Serbia</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Modern medical research and clinical practice are more dependent than ever
on multi-factorial data sets originating from various sources, such as medical
imaging, DNA analysis, patient health records and contextual factors. This data
drives research, facilitates correct diagnoses and ultimately helps to develop
and select the appropriate treatments. The volume and impact of this data
has increased tremendously through technological developments such as
highthroughput genomics and high-resolution medical imaging techniques.
Additionally, the availability and popularity of di erent wearable health care devices has
allowed the collection and monitoring of ne-grained personal health care data.
The fusion and combination of these heterogeneous data sources has already
led to many breakthroughs in health research and shows high potential for the
development of methods that will push current reactive practices towards
predictive, personalized and preventive health care. This potential is recognized and
has led to the development of many platforms for the collection and statistical
analysis of health care data (e.g. Apple Health, Microsoft Health Vault,
Oracle Health Management, Philips HealthSuite, and EMC Health care Analytics).
However, the heterogeneity of the data, privacy concerns, and the complexity
and multiplicity of health care processes (e.g. diagnoses, therapy control, and
risk prediction) creates signi cant challenges for data fusion, algorithm selection
and tuning. These challenges leave a gap between the actual and the potential
data usage in health care, which prevents a paradigm shift from delayed
generalized medicine to predictive personalized medicine [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As such, a platform for
collaborative and privacy-preserving sharing, analysis and evaluation of health
care data would drastically facilitate the creation of advanced models on
heterogeneous fused data, as well as ensure the reproducibility of results, and provide
a solid basis for the development of algorithm ranking and selection methods
based on collaborative meta-learning.
      </p>
      <p>In this work we present an extensions of the OpenML platform that will be
addressed in our future work in order to meet the needs of meta-learning in
health care predictive analytics: privacy preserving sharing of data, work ows
and evaluations, reproducibility of the results, and rich meta-data spaces about
both data and algorithms.</p>
      <p>
        OpenML.org [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a collaboration platform which is designed to organize
datasets, machine learning work ows, models and their evaluations. Currently,
OpenML is not fully distributed but can be installed on local instances which can
communicate with the main OpenML database using mirroring techniques. The
downside of this approach is that code (machine learning work ows), datasets,
experiments (models and evaluations) are physically kept on local instances, so
users cannot communicate and share. We plan to turn OpenML into a fully
distributed machine learning platform, which will be accessible from di erent
data mining and machine learning platforms such as RapidMiner, R, WEKA,
KNIME, or similar. Such a distributed platform would allow the ease of sharing
data and knowledge. Currently, regulations and privacy concerns often prevent
hospitals to learn from each other's approaches (e.g. machine learning
workows), reproduce work done by others (data version control, preprocessing and
statistical analysis), and build models collaboratively.
      </p>
      <p>On the other hand, meta-data such as type of the hospital, percentage of
readmitted patients or indicator of emergency treatment, as well as the learned
models and their evaluations can be shared and have great potential for the
development of a cutting edge meta-learning system for ranking, selection and
tuning of machine learning algorithms.</p>
      <p>
        The success of meta-learning systems is greatly in uenced by the size of
problem (data) and algorithm spaces, but also by the quality of the data and
algorithm descriptions (meta-features). Thus, we plan to employ domain
knowledge provided by expert and formal sources (e.g. ontologies) in order to extend
the meta-feature space for meta-learning in health care applications. For
example, in meta-analyses of gene expression microarray data, the type of chip is very
important in predicting algorithm performance. Further, in fused data sources
it would be useful to know which type of data contributed to the performance
(electronic health records, laboratory tests, data from wearables etc.). In
contrast to data descriptions, algorithm descriptions are much less analyzed and
applied in the meta-learning process. Recent results [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] showed that descriptions
on the level of algorithm parts (e.g. initialization type and internal evaluation
measures in clustering algorithms), could improve quality of meta-learning
predictions, and additionally identify which algorithm parts really in uenced the
overall performance. Hence, we will include component based algorithm de
nitions as meta-features and allow their usage as predictors in meta-learning
systems. The development of such a collaborative meta-learning system would
address di erent challenging tasks in health care predictive analytics like early
diagnostics and risk detection, hospital re-admission prediction, automated
therapy control or similar with many potential stakeholders: patients, doctors,
hospitals, insurance companies, among others.
      </p>
      <p>Acknowledgement
This research was supported by SNSF Joint Research project (SCOPES), ID:
IZ73Z0{152415.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Olga</given-names>
            <surname>Golubnitschaja</surname>
          </string-name>
          , Judita Kinkorova, and
          <string-name>
            <given-names>Vincenzo</given-names>
            <surname>Costigliola</surname>
          </string-name>
          .
          <article-title>Predictive, preventive and personalised medicine as the hardcore of horizon 2020: Epma position paper</article-title>
          .
          <source>EPMA J</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <fpage>6</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Joaquin</given-names>
            <surname>Vanschoren</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jan N van Rijn</surname>
          </string-name>
          ,
          <string-name>
            <surname>Bernd Bischl</surname>
            , and
            <given-names>Luis</given-names>
          </string-name>
          <string-name>
            <surname>Torgo</surname>
          </string-name>
          .
          <article-title>Openml: networked science in machine learning</article-title>
          .
          <source>ACM SIGKDD Explorations Newsletter</source>
          ,
          <volume>15</volume>
          (
          <issue>2</issue>
          ):
          <volume>49</volume>
          {
          <fpage>60</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Milan</given-names>
            <surname>Vukicevic</surname>
          </string-name>
          , Sandro Radovanovic, Boris Delibasic, and
          <string-name>
            <given-names>Milija</given-names>
            <surname>Suknovic</surname>
          </string-name>
          .
          <article-title>Extending meta-learning framework for clustering gene expression data with component based algorithm design and internal evaluation measures</article-title>
          .
          <source>International Journal of Data Mining and Bioinformatics, "In Press".</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>