<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>C-NBC: Neighborhood-Based Clustering with Constraints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Piotr Lasek</string-name>
          <email>lasek@ur.edu.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair of Computer Science, University of Rzeszów ul.</institution>
          <addr-line>Prof. St. Pigonia 1, 35-310 Rzeszów</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Clustering is one of most important methods of data mining. It is used to identify unknown yet interesting and useful patterns or trends in datasets. There are different types of clustering algorithms such as partitioning, hierarchical, grid and density-based. In general, clustering methods are considered unsupervised, however, in recent years the new branch of clustering algorithms has emerged, namely constrained clustering algorithms. By means of socalled constraints, it is possible to incorporate background knowledge into clustering algorithms which usually leads to better performance and accuracy of clustering results. Through the last years, a number of clustering algorithms employing different types of constraints have been proposed and most of them extend existing partitioning and hierarchical approaches. Among density-based methods using constraints algorithms such as C-DBSCAN, DBCCOM, DBCluC were proposed. In this paper we offer a new C-NBC algorithm which combines known neighborhood-based algorithm (NBC) and instance-level constraints.</p>
      </abstract>
      <kwd-group>
        <kwd>data mining</kwd>
        <kwd>clustering</kwd>
        <kwd>semi-supervised learning</kwd>
        <kwd>constraints</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Clustering is one of well-known and often used data mining methods. Its goal is to
assign data objects (or points) to different clusters so that objects that were assigned
to the same clusters are more similar to each other than to objects assigned to another
clusters [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Clustering algorithms can operate on different types of data sources such
as databases, graphs, text, multimedia, or on any dataset containing objects that could
be described by a set of features or relationships [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Since clustering algorithms do
not require any external knowledge to be run (e.g. except parameters like k in the
kMeans algorithm), the process of clustering, in opposite to classification, is also often
referred to as an unsupervised learning. However, there has always been a natural
need to incorporate already collected knowledge into algorithms to make them better
both in terms of efficiency and quality of results. This need leads to the construction
of a new branch of clustering algorithms based on so-called constraints.
      </p>
      <p>
        Constraint-based clustering algorithms employ the fact, that in many applications,
the domain knowledge in the form of labeled objects is already known or could be
easily specified by domain experts. Moreover, in some cases such knowledge can be
automatically detected. Initially, researchers focused on algorithms that incorporated
pairwise constraints on cluster membership or learned distance metrics. Subsequent
research was related to algorithms that used many different kinds of domain
knowledge [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        The major contribution of this work is the offering of a constrained
neighborhoodbased clustering algorithm called C-NBC which combines the known NBC
algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and instance-level constraints such as must-link and cannot-link.
      </p>
      <p>This paper is divided into five sections. In Section 1 we have given a brief
introduction to clustering with constraints. Next, in Section 2, we shortly describe most
known types of constraints and focus on instance-level constraints such as must-link
and cannot-link constraints. In Section 3, the Neighborhood-Based Clustering (NBC)
is reminded. Further, in Section 4 we present the proposed C-NBC algorithm.
Conclusions are drawn and further research is briefly discussed in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Instance-Level Constraints</title>
      <p>
        In constrained clustering background knowledge can be incorporated into algorithms
by means of different types of constraints. Through the years, different methods of
using constraints in clustering algorithms have been developed [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Constraint-based
methods proposed so far employ techniques such as modifying the clustering
objective function including a penalty for satisfying specified constraints [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], clustering
using conditional distributions in an auxiliary space, enforcing all constraints to be
satisfied during clustering process [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or determining clusters and constraints based on
neighborhoods derived from already available labeled examples [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. On the other
hand, in the distance based methods, the distance measure is designed so that it
satisfies given constraints [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. A few density-based constrained algorithms such as
C-DBSCAN [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], DBCCOM [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] or DBCluC [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], have also been proposed.
      </p>
      <p>
        Several types of constraints are known, but the hard instance-level constraints seem
to be most useful since the incorporation of just few constraints of this type can
increase clustering accuracy as well as decrease runtime. In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] authors introduced two
kinds of instance-level constraints: the must-link and cannot-link constraints. These
constraints are simple, however they have interesting properties. For example
mustlink constraints are symmetrical, reflexive and transitive, similarly to an equivalence
relation. Generally, if two points, say and , are in a must-link relationship (or, in
other words, are connected by a must-link constraint), then these points, in a process
of clustering, must be assigned to the same cluster . On the other hand, if two points,
say and , are in a cannot-link relationship (or are connected by a cannot-link
constraint), then these points must not be assigned to the same cluster .
      </p>
      <p>In the next part of the paper, when referring to must-link and cannot-link
constraints, we will use the notations presented in Table 1.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Neighborhood-Based Clustering</title>
      <p>
        The Neighborhood-Based Clustering (NBC) algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] belongs to the group of
density based clustering algorithms. The characteristic feature of NBC is the ability to
measure relative local densities. Hence, it is capable of discovering clusters of
different local densities and of arbitrary shape. Below we remind the key definitions related
to the NBC algorithm which will be used in the sequel.
      </p>
      <sec id="sec-3-1">
        <title>Definition 1. (ε-neighborhood, or briefly</title>
        <p>( ) is the set of all points in dataset
; that is,
) ε-neighborhood of point
that are distant from by no more than
, where is a distance function.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Definition 2. ( -neighborhood, or</title>
        <p>( ) is a set of (
(),
in brief) -neighborhood of point
) points satisfying the following conditions:
().</p>
      </sec>
      <sec id="sec-3-3">
        <title>Definition 3. (punctured -neighborhood, or briefly</title>
        <p>of point ( ) is equal to , where
.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Definition 4. (reversed punctured -neighborhood of a point , or in</title>
        <p>brief) Reversed punctured k+-neighborhood of a point ( ) is the set of all
points in dataset such that ; that is:
.</p>
      </sec>
      <sec id="sec-3-5">
        <title>Definition 5. (neighborhood-based density factor of a point) Neighborhood-based</title>
        <p>density factor of a point ( ) is defined as</p>
        <p>. (Points having the value of the NDF factor equal to or greater than 1,
are considered as dense.)</p>
        <p>In order to find clusters, NBC starts with calculating values of factors for
each point in a database , . Next, for each , a value of is
checked. If it is greater than or equal to , then is assigned to the currently created
cluster (identified by the value of ). Next, the temporary variable
for storing references to points, is cleared and each point, say , belonging to
is assigned to . If is greater than or equal to , then is also
added to . Otherwise, is omitted and a next point from is analyzed.
Further, for each point from , say , is computed. All unclassified
)
-neighborhood
points belonging to are assigned to and points having values of
greater than or equal to 1 are added to . Next, is removed from . When
is empty, is incremented and a next point from , namely , is
analyzed. Finally, if there are no more points in having values of factor
greater than or equal to 1, then all unclassified points in are marked as NOISE.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Clustering with Constraints</title>
      <p>
        In this section we offer our new neighborhood-based constrained clustering algorithm
called C-NBC. The algorithm is based on the NBC algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] but uses both
mustlink and cannot-link constraints for incorporating knowledge into the algorithm.
      </p>
      <p>The C-NBC algorithm employs the same definitions as NBC which are used in a
process of clustering to assign points to appropriate clusters or mark them as noise. In
NBC three kinds of points can be distinguished: unclassified, classified and noise
points. In our proposal, we introduced another kind of points, namely deferred points
(Definition 6).</p>
      <p>Definition 6. (deferred point) A point is called deferred if it is involved in a
cannotlink relationship with any other point or it belongs to a -punctured neighborhood
, where is any point involved in a cannot-link relationship.</p>
      <p>C-NBC (Figure 1) can be divided into two main steps. In the first step the
algorithms works very similarly to NBC. It calculates NDF factors and performs
clustering. The difference between C-NBC and NBC is that the former additionally
determines deferred points and has to deal with must-link constraints when building
clusters. In spite of the fact that in the first step the deferred points are not assigned to any
cluster, these points are normally used to calculate NDF factors. In the second step
C-NBC works so that it allocates deferred points to appropriate clusters. This is done
by means of the AssignDeferredPointsToClusters function (Figure 2) which assigns
points to clusters after checking if such an assignment if feasible. The detailed
description of C-NBC is provided beneath. The key variables and notations related to
C-NBC are explained in Table 2.
,</p>
      <p>Description
The auxiliary integer variable used for storing
currentlycreated cluster’s identifier.</p>
      <p>By using such a notation we refer to a related to
point .</p>
      <p>Such a notation is used to refer to a value of the NDF factor
associated with point .</p>
      <sec id="sec-4-1">
        <title>The auxiliary variables for storing deferred points.</title>
      </sec>
      <sec id="sec-4-2">
        <title>The variable for storing dense points. It is used for in an iterative process of assigning points to clusters.</title>
        <p>Algorithm C-NBC( , , , )
Input: the input not clustered dataset</p>
        <p>the parameter of the C-NBC
, the sets of pairs of points connected by must-link or cannot-link constraints
Output: The clustered set with clusters satisfying cannot-link and must-link constraints.</p>
        <p>; = 0; // initialization of a set of deferred points and
label all points in as UNCLASSIFIED; // = UNCLASSIFIED
CalcNDF( , ); // calculate values of NDF factor for every point in
┌ for each point involved in any constraint from or do
│ label and points in as DEFERRED // label points as DEFERRED
│ add to ; // add DEFERRED points to
└ endfor
// expanding a currently created cluster</p>
        <p>The C-NBC algorithm starts with the CalcNDF function. After calculating the NDF
factors for each point from D, the deferred points are determined by scanning pairs of
cannot-link connected points. Such a points are added to an auxiliary set .</p>
        <p>Then, the clustering process is performed in the following way: for each point
which was not marked as DEFERRED, it is checked if is less than . If
, then is omitted and a next from is checked. If , then
as a dense point is assigned to the currently-created cluster identified by the current
value of .</p>
        <p>Next, the temporary variable is cleared and each not deferred point, say ,
belonging to is assigned to the currently-created cluster identified by
the current value of the variable. Additionally, if , then it is
assigned to as well as all dense points which are in a must-link relation with .
Function AssignDeferredPointsToClusters( , , , )
Input: the input not clustered dataset
the set of points marked as deferred
the parameter of the C-NBC algorithm
the set of cannot-link constraints
Output: The clustered set with clusters satisfying cannot-link and must-link constraints
// saving contents of in a temporary set</p>
        <p>Next, for each unclassified point from , say , its punctured
-neighborhood is determined. Each point, say , which belongs to this neighborhood and has
not been labeled as deferred yet is assigned to the currently created cluster and if its
value of NDF is equal to or greater than 1, is added to . Moreover, all dense
points which are in a must-link relation with are added to as well. Later, s is
removed from and next point from is processed. When is
emptied, then is incremented.</p>
        <p>After all points from are processed, unclassified points are marked as noise by
setting the values of their attribute to NOISE. However, in order to process
the deferred points, the AssignDeferredPointsToCluster function is invoked. The
function performs so that for each deferred point it finds the nearest point
assigned to any cluster and checks whether it is possible (in accordance with cannot-link
constraints) to assign to the same cluster as . Additionally, the function checks if
the assignment of to a specific cluster will not violate previous assignments of
deferred points.</p>
        <p>
          In order to test the efficiency of the proposed C-NBC algorithm we performed a
number of basic experiments using the following datasets: a manually created dataset
containing 2800 data points, the known four dimensional iris dataset [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and the
birch dataset [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. The main goal of the experiments was to test the influence of a
number of constraints used in the process of clustering on the efficiency of the
proposed algorithm. In Table 2 we compare the run-times of the NBC and C-NBC
algorithms. Both implementations of the algorithms employ the same index structure,
namely the R-Tree [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. When performing experiments using C-NBC we were
changing the number of must-link and cannot-link constraints from 2 to 250 except for the
Iris dataset, which was not numerous enough to use number of both types of
constraints greater than 50. Since additional operations must be performed, it was
obvious for us that the number of constraints would have a negative impact on the
efficiency of the algorithm. However, we did not encounter any crucial increase (at most
about 2 times for the largest dataset and 250 must-link and 250 cannot-link
constraints) in the run-time of clustering using the C-NBC algorithm. For this reason,
similarly to NBC, C-NBC can be considered as effective, efficient, and easy to use.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and further research</title>
      <p>
        In recent years a number of algorithms employing different kinds of constraints have
been proposed. Most commonly used constraints are instance constraints which are
widely employed in enriching clustering algorithms to utilize additional background
knowledge. However, among density based clustering algorithms, so far only the
DBSCAN algorithm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] was adapted to use instance constraints.
      </p>
      <p>In this paper we have offered C-NBC, the density based algorithm for clustering
under instance constraints which was based on the known NBC algorithm. Our
preliminary experiments, confirm that the algorithm performs in line with the
assumptions. In other words, it works so that must-link connected points are assigned to the
same clusters and cannot-link connected points are assigned to different clusters.
Moreover, for datasets tested, it turns out that constraints have a little effect on the
efficiency of the algorithm. In spite of this, in our future research we would like to
still focus on performance of constrained based clustering and on proposing other
methods of ensuring constraints during clustering process for larger and high
dimensional datasets.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. J. Han,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kamber</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <source>Data Mining: Concepts and Techniques</source>
          , 3rd ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Basu</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Davidson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Wagstaff</surname>
          </string-name>
          , Constrained Clustering:
          <article-title>Advances in Algorithms</article-title>
          , Theory, and Applications,
          <volume>1</volume>
          <fpage>edition</fpage>
          , vol.
          <volume>45</volume>
          .
          <year>2008</year>
          , pp.
          <fpage>961</fpage>
          -
          <lpage>970</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>I.</given-names>
            <surname>Davidson</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Basu</surname>
          </string-name>
          , “
          <article-title>A Survey of Clustering with Instance Level Constraints,”</article-title>
          <source>ACM T. Knowl. Disc. Data</source>
          , vol. w, pp.
          <fpage>1</fpage>
          -
          <lpage>41</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spiliopoulou</surname>
          </string-name>
          , and E. Menasalvas, “C-DBSCAN:
          <article-title>Density-Based Clustering with Constraints,”</article-title>
          <source>in RSFDGr'07: Proc. of the International Conf. on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing held in JRS07</source>
          ,
          <year>2007</year>
          , vol.
          <volume>4481</volume>
          , pp.
          <fpage>216</fpage>
          -
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          , “
          <article-title>A Neighborhood-Based Clustering Algorithm,” in Advances in Knowledge Discovery and Data Mining SE - 43</article-title>
          , vol. 3518,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cheung</surname>
          </string-name>
          , and H. Liu, Eds. Springer Berlin Heidelberg,
          <year>2005</year>
          , pp.
          <fpage>361</fpage>
          -
          <lpage>371</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>K.</given-names>
            <surname>Wagstaff</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Cardie</surname>
          </string-name>
          , “
          <article-title>Clustering with Instance-level Constraints,”</article-title>
          <source>in Proceedings of the Seventeenth International Conference on Machine Learning</source>
          ,
          <year>2000</year>
          , pp.
          <fpage>1103</fpage>
          -
          <lpage>1110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>I.</given-names>
            <surname>Davidson</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ravi</surname>
          </string-name>
          , “
          <article-title>Clustering With Constraints: Feasibility Issues and the k-Means Algorithm,”</article-title>
          <source>in Proceedings of the 2005 SIAM Int. Conf. on Data Mining</source>
          , pp.
          <fpage>138</fpage>
          -
          <lpage>149</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>K.</given-names>
            <surname>Wagstaff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cardie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rogers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Schrödl</surname>
          </string-name>
          , “
          <article-title>Constrained K-means Clustering with Background Knowledge,”</article-title>
          <source>in Proceedings of the Eighteenth International Conference on Machine Learning</source>
          ,
          <year>2001</year>
          , pp.
          <fpage>577</fpage>
          -
          <lpage>584</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>S.</given-names>
            <surname>Basu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Mooney</surname>
          </string-name>
          , “
          <article-title>Semi-supervised Clustering by Seeding,”</article-title>
          <source>in Proceedings of the 19th International Conference on Machine Learning</source>
          ,
          <year>2002</year>
          , pp.
          <fpage>27</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>T. H. Tomboy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bar-Hillel</surname>
            , and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Weinshall</surname>
          </string-name>
          , “
          <article-title>Boosting Margin Based Distance Functions for Clustering,</article-title>
          ” in
          <source>In Proceedings of the Twenty-First International Conference on Machine Learning</source>
          ,
          <year>2004</year>
          , pp.
          <fpage>393</fpage>
          -
          <lpage>400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>H.</given-names>
            <surname>Chang</surname>
          </string-name>
          and D.-Y. Yeung, “
          <article-title>Locally Linear Metric Adaptation for Semi-supervised Clustering,”</article-title>
          <source>in Proc. of the 21st International Conf. on Machine Learning</source>
          ,
          <year>2004</year>
          , pp.
          <fpage>153</fpage>
          -
          <lpage>160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>M. Ester</surname>
            ,
            <given-names>H. P.</given-names>
          </string-name>
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sander</surname>
            , and
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Xu</surname>
          </string-name>
          , “
          <article-title>A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,”</article-title>
          <source>in Second International Conference on Knowledge Discovery and Data Mining</source>
          ,
          <year>1996</year>
          , pp.
          <fpage>226</fpage>
          -
          <lpage>231</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>C. L. Blake</surname>
            and
            <given-names>C. J.</given-names>
          </string-name>
          <string-name>
            <surname>Merz</surname>
          </string-name>
          , “
          <article-title>UCI Repository of machine learning databases</article-title>
          ,” University of California. p. http://archive.ics.uci.edu/ml/,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. T. Zhang,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Livny</surname>
          </string-name>
          , “BIRCH 
          <article-title>: A New Data Clustering Algorithm and Its Applications,” Data Min</article-title>
          . Knowl. Discov., vol.
          <volume>1</volume>
          , pp.
          <fpage>103</fpage>
          -
          <lpage>144</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. A. Guttman,
          <string-name>
            <surname>“R Trees: A Dynamic Index Structure For Spatial Searching</surname>
          </string-name>
          ,”
          <source>in ACM SIGMOD Int. Conf. on Management of Data - SIGMOD '84</source>
          ,
          <year>1984</year>
          , p.
          <fpage>47</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Duhan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>DBCCOM: Density Based Clustering with Constraints and Obstacle Modeling</article-title>
          .
          <source>In Contemporary Computing SE - 24</source>
          (Vol.
          <volume>168</volume>
          , pp.
          <fpage>212</fpage>
          -
          <lpage>228</lpage>
          ). Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zaiane</surname>
            ,
            <given-names>O. R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-H. L. C.-H.</surname>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Clustering spatial data when facing physical constraints</article-title>
          .
          <source>2002 IEEE International Conference on Data Mining</source>
          ,
          <year>2002</year>
          . Proceedings.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>