<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>May</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Clustering Evaluation in High-Dimensional Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Milos Radovanovic</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>4</volume>
      <issue>2019</issue>
      <abstract>
        <p>of Invited Presentation</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Clustering evaluation plays an important role in unsupervised learning systems, as it is often necessary to
automatically quantify the quality of generated cluster con gurations. This is especially useful for comparing the
performance of di erent clustering algorithms as well as determining the optimal number of clusters in clustering
algorithms that do not estimate it internally. Many clustering quality indexes have been proposed over the years
and di erent indexes are used in di erent contexts. There is no unifying protocol for clustering evaluation, so it
is often unclear which quality index to use in which case.</p>
      <p>In this talk, we review existing clustering quality measures and evaluate them in the challenging context of
high-dimensional data clustering. High-dimensional data is sparse and distances tend to concentrate, possibly
a ecting the applicability of various clustering quality indexes. We analyze the stability and discriminative power
of a set of standard clustering quality measures with increasing data dimensionality. Our evaluation shows that
the curse of dimensionality a ects di erent clustering quality indexes in di erent ways, and that some are to be
preferred when determining clustering quality in many dimensions.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>