<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Relative Constraints as Features</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Piotr Lasek</string-name>
          <email>lasek@ur.edu.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Krzysztof Lasek</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair of Computer Science, University of Rzeszow</institution>
          ,
          <addr-line>ul. Prof. Pigonia 1, 35-510 Rzeszow</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Computer Science, Warsaw University of Technology</institution>
          ,
          <addr-line>ul. Nowowiejska 15/19, 00-665 Warszawa</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>One of most commonly used methods of data mining is clustering. Its goal is to identify unknown yet interesting and useful patters in datasets. Clustering is considered as unsupervised, however recent years have shown a tendency toward incorporation external knowledge into clustering methods making them semi-supervised methods. Generally, all known types of clustering methods such as partitioning, hierarchical, grid and density-based, have been adapted to use so-called constraints. By means constraints, the background knowledge can be easily used with clustering algorithms which usually leads to better performance and accuracy of clustering results. In spite of growing interest in constraint based clustering this domain still needs attention. For example, a promising relative constraints have not been widely investigated and seem to be very promising since they can be easily represent domain knowledge. Our work is another step in the research on relative constraints. We have used and simplified the approach presented in [1] so that we created a mechanism of using relative constraints as features.</p>
      </abstract>
      <kwd-group>
        <kwd>clustering</kwd>
        <kwd>density-based clustering</kwd>
        <kwd>constrainted clustering</kwd>
        <kwd>relative constraints</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Normally, clustering proceeds in an unsupervised manner. However, in some
cases, it is desirable to incorporate external knowledge into a clustering
algorihtm. Let us consider the following example. A system designed for analysing
customers’ data of a GSM company is going to be built. The purpose of this
system is to give managers the ability to detect new groups of customers for which
new and customised individual plans could be offered. The number of customers
is large and each of them can be described by the great number (thousands) of
attributes. In order to analyse and cluster the dataset any clustering algorithm
that is capable of dealing with large and high dimensional datasets can be
applied. However, the company employs a number of experinced analysts which
know the market and can give invaluable insights into the system. The question
is how to easily incorporate analysts’ knowledge into clusteiring algorithms so
that it could enrich the clustering results.</p>
      <p>
        One of the ways of incorporating external knowledge into clustering
algorithms is to use so-called constraints [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Several types of constraints are known.
In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] authors introduced simple yet very popular instance-level constraints,
namely: the must-link and cannot-link constraints. If we say that two points
p0 and p1 are in a must-link relationship (or are connected by a must-link
constraint) then, by means of a clustering process, these points will be assigned to
the same cluster c. On the other hand, if we say that two points p0 and p1 are
in a cannot-link relationship (or are connected by a cannot-link constraint) then
these points will not be assigned to the same cluster c.
      </p>
      <p>
        Incorporation of of just few constraints of above type can increase clustering
accuracy as well as decrease runtime [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Asafi and Cohen-Or presented an interesting method of incorporating
instance constraints into any clustering algorithm. They proposed to treat
constraints as additional features of a given object. In order to incorporate these
contraints, they alter the original distance matrix so that they set the distances
between objects in a must-link relationship to shortest distance between any two
objects in the input dataset. Then, the triangle inequality is restored.
Additionaly, in order to take cannot-link constraints into account, they employed the idea
of the diffusion distance which was used to compute modified distances between
objects satisfying cannot-link constraints [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In other words, the process of using
instance-level constraints as features is composed of two steps. First, must-link
constraints are used to modify a distance matrix. Then, by using the
Diffusion Map to compute diffusion distances, cannot-link constraints are taken into
account. The above reminded computations are performed using the following
formulas:
      </p>
      <p>Dei;j = Dbi;j +</p>
      <p>
        ∑
c=1:::N
αDi(;cj),
where Dbi;j is the matrix created by adding must-link constraints to the original
distance matrix Di;j , α is used to scale distance space from ( 1, 1) interval to
( α, α) (α is the longest distance in Di;j ), Dei;j is the matrix constructed by
adding Di(;cj) matrices to it, where c = 1, ..., N (N is a number of cannot-link
constraints) and each Di(;cj) matrix represents one cannot-link constraint. Di(;cj)
is called the diffusion distance matrix [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and is computed using the following
formulas:
      </p>
      <p>Di(;cj) = jvi</p>
      <p>vj j,
vi =
φ(i, c2) φ(i, c1) ,
φ(i, c2) + φ(i, c1)
where c1 and c2 are cannot-link objects in a cannot-link relationship and i is
the index of the object in the dataset. φ is the function given by the following
formula:
φ(x, y) = jΨt(x)</p>
      <p>Ψt(y)j,
where</p>
      <p>
        Ψt(x) = (λt1ψ1(x), λt2ψ2(x), . . . , λt1ψn(x))T ,
where λit are the eigen values and ψi are the eigen vectors of the input dataset’s
Diffusion Map, n is the number of objects in a dataset and t is a time parameter
which can be determined experimentally [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Diffusion maps are based on the observation that when walking from one
point to another in a dataset, it is more probable to traverse points that are
located nearby than far away. This observation leads to defining the probability of
traversing from point a to b. As described above, in order to use a Diffusion Map,
several computation has to be done. First, the distance matrix needs to be built.
Then, the computed distance matrix has to be normalized by using the sums of
its rows. Next, the spectral decomposition is performed to using eigenvalues and
eigenvectors of the previously normalized matrix. The dimensionality reduction
is performed by omitting the smallest eigenvalues. For this reason diffusion maps
can be used to deal with high dimensional datasets by discovering an underlying
manifold from data and to give a simpler global description of the dataset [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Relative constraints as features</title>
      <p>
        In recent studies relative constraints gain more and more attention due to the
fact that they can easily represent domain knowledge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Relative constraints
are usually defined as object triples and are presented in the following way: abjc,
where a, b and c are objects from the dataset and a is closer to b than to c. The
formula abjc is equivalent to the following comparision: d(a, b) &lt; d(a, c) where
d is the distance function. The intuition behind relative constraints (comparing
to instance-level constraints) is that it may be easier to define relative similarity
between objects and use this knowledge in a process of clustering than strictly
define which objects should or should not be assigned to specific clusters.
      </p>
      <p>
        Our proposal of using relative constraints comprises the ideas presented by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
and [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In our approach we employ relative constraints into the clustering process
so that we similarly construct modified diffusion distance matrix. However, our
method may be considered simpler because of the fact that we do not have
to restore triangle inequality property. Because of that, the resulting distance
matrix is given by the following formula:
      </p>
      <p>Dei;j = Di;j +</p>
      <p>∑
r=1:::N
αDi(;rj),
where r is a set of relative constarints (a set of objects triples).</p>
      <p>Further, in our method in order to compute the diffusion distance matrix
Di(;rj) of diffusion distances between objects, the following formulas are used:
where</p>
      <p>Di(;rj) = jvi</p>
      <p>vj j,
vi =
min(φ(i, a) + φ(i, b)) φ(i, c)
min(φ(i, a) + φ(i, b)) + φ(i, c)
where a, b and c are points which are in a relative relationship abjc.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments and results</title>
      <p>
        In order to test our method we implemented a framework for performing data
clustering experiments. We have implemented appropriate functions for
computing diffusion distance matrices which were later used in a neighborhood-based
clustering algorithm (NBC) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to determine nearest neighbors. We have
performed a number of experiments using several well known benchmarch datasets
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Due to the fact that in order to determine diffusion distance matrix a eigen
vectors end eigen values must be computed, the overall time efficiency of the
method is low. However, the qualitative results are very promising. Moreover,
in comprasion to instance-level, relative constraints can be specified by experts
more easily since an a priori knowledge about assignement of the object to the
same cluster is not requried. The only information necessary to obtain from a
domain expert is the specification of the relation between two objects.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and further research</title>
      <p>In the nearest future we are going to focus to make our method more efficient.
Moreover we want to focus on examination of the influence of the t parameter
on the quality and efficiency of our method. Additionally we would like to test
different core functions used for when determining the Diffusion Map ane check
their influence on the results of the clustering.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Asafi</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Cohen-Or</surname>
          </string-name>
          ,
          <article-title>”Constraints as Features,”</article-title>
          <source>in IEEE Conference on Computer Vision</source>
          and Pattern
          <string-name>
            <surname>Recognition</surname>
          </string-name>
          (
          <year>2013</year>
          ),
          <year>2013</year>
          , pp.
          <fpage>1634</fpage>
          -
          <lpage>1641</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Basu</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Davidson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Wagstaff</surname>
          </string-name>
          , Constrained Clustering:
          <article-title>Advances in Algorithms</article-title>
          , Theory, and Applications,
          <volume>1</volume>
          <fpage>edition</fpage>
          , vol.
          <volume>45</volume>
          .
          <year>2008</year>
          , pp.
          <fpage>961</fpage>
          -
          <lpage>970</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>I.</given-names>
            <surname>Davidson</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ravi</surname>
          </string-name>
          , ”
          <article-title>Clustering With Constraints: Feasibility Issues and the k-Means Algorithm,”</article-title>
          <source>in Proceedings of the 2005 SIAM Int. Conf. on Data Mining</source>
          , pp.
          <fpage>138</fpage>
          -
          <lpage>149</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Schultz</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <article-title>Learning a distance metric from relative comparisons</article-title>
          .
          <source>In NIPS 04</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>C.</given-names>
            <surname>Semple</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Steel</surname>
          </string-name>
          .
          <article-title>A supertree method for rooted trees</article-title>
          .
          <source>Discrete Applied Mathematics</source>
          ,
          <volume>105</volume>
          (
          <fpage>1</fpage>
          -31-3):
          <fpage>147158</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Nadler</surname>
          </string-name>
          ,
          <string-name>
            <surname>Boaz</surname>
          </string-name>
          , et al.
          <article-title>Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators</article-title>
          .
          <source>arXiv preprint math/0506090</source>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>De la Porte</surname>
          </string-name>
          , J., et al.
          <article-title>An introduction to diffusion maps</article-title>
          .
          <source>Applied Mathematics Division</source>
          , Department of Mathematical Sciences, University of Stellenbosch, South Africa and Colorado School of Mines, United States of America (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lafon</surname>
          </string-name>
          , Stephane, and
          <string-name>
            <surname>Ann</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization</article-title>
          .
          <source>Pattern Analysis and Machine Intelligence</source>
          ,
          <source>IEEE Transactions on 28.9</source>
          (
          <year>2006</year>
          ):
          <fpage>1393</fpage>
          -
          <lpage>1403</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>Shuigeng</surname>
          </string-name>
          , et al. ”
          <article-title>A neighborhood-based clustering algorithm</article-title>
          .
          <source>” Advances in Knowledge Discovery and Data Mining</source>
          . Springer Berlin Heidelberg,
          <year>2005</year>
          .
          <fpage>361</fpage>
          -
          <lpage>371</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Asuncion</surname>
            , Arthur, and
            <given-names>David Newman. ”</given-names>
          </string-name>
          <article-title>UCI machine learning repository</article-title>
          .
          <source>”</source>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>