<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Correlation Clustering: from Local to Global Constraints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>(Discussion Paper)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domenico Mandaglio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Tagarelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Gullo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DIMES Dept., University of Calabria</institution>
          ,
          <addr-line>87036 Rende (CS)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>UniCredit</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Given a set of data objects, consider that object pairs are assigned two weights expressing the advantage of putting those objects in the same cluster or in separate clusters, respectively. Correlation clustering partitions the input object set so as to minimize the sum of the intra-cluster negative-type weights plus the sum of the inter-cluster positive-type weights. Existing approximation algorithms provide quality guarantees if the weights are bounded in some way. Regardless of the type, the weight bounds that have been so far studied are local bounds, i.e., constraints that are required to hold for every object pair in isolation. In this paper, we discuss global weight bounds in correlation clustering, and in particular, we derive bounds on edge weights' aggregate functions that are suficient to lead to proved quality guarantees. Our formulation extends the range of applicability of the most prominent existing correlationclustering algorithms thus providing benefits, both theoretical and practical. Also, we showcase our results in a real-world scenario of feature selection for fair clustering.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;min-disagreement correlation clustering</kwd>
        <kwd>probability constraint</kwd>
        <kwd>fair clustering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Correlation clustering [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is an important clustering formulation that has received considerable
attention from theoreticians and practitioners, and found application in several contexts [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Given a set of objects and nonnegative real weights expressing “positive” and “negative”
feeling of clustering any two objects together, min-disagreement correlation clustering (Min-CC)
partitions the input object set so as to minimize the sum of the intra-cluster negative-type
weights plus the sum of the inter-cluster positive-type weights. Min-disagreement correlation
clustering is APX-hard, but eficient constant-factor approximation algorithms exist if the
weights are bounded in some way. The weight bounds tackled so far in the literature are said
local, as they are required to hold for every single object pair.</p>
      <p>
        In this paper, we discuss the main theoretical and experimental results from our study
in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], where we introduced the problem of min-disagreement correlation clustering with
global weight bounds, i.e., constraints to be satisfied by the input weights altogether. Our main
contribution is a suficient condition that establishes when any algorithm achieving a certain
Input: Graph  = (, ); nonnegative weights +, − , ∀ ∈ 
Output: Clustering  of 
1:  ← ∅ ,  ′ ← 
2: while  ′ ̸= ∅ do
3: pick a pivot vertex  ∈  ′ uniformly at random
4: add  = {} ∪ { ∈  ′ | (, ) ∈ , + &gt; −} to  and remove  from  ′
approximation under the probability constraint keeps the same guarantee on an input that
violates the constraint. This extends the range of applicability of the most prominent existing
correlation-clustering algorithms, including the popular Pivot, thus providing both theoretical
and practical benefits. Experiments have shown the usefulness of our approach, in terms of
both worthiness of employing existing eficient algorithms, and guidance on the definition of
weights from feature vectors in a task of fair clustering.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Correlation Clustering with local weight bounds</title>
      <p>The input of correlation clustering is a set  of objects, and two nonnegative, real-valued weights
+, − for every (unordered) object pair ,  ∈  . Any “positive” + (resp. “negative” −)
weight expresses the benefit of clustering  and  together (resp. separately). This input can
equivalently be represented as a graph  with vertex set  and edge weights +, −, for all
,  ∈  , and with edge (, ) being drawn only if at least one among + and − is nonzero.</p>
      <p>
        In this work we tackle the problem of min-disagreement correlation clustering:
Problem 1 (Min-CC [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). Given an undirected graph  = (, ), with vertex set  and edge
set  ⊆  ×  , and nonnegative weights +, − ∈ R0+ for all edges  ∈ , find a clustering
(i.e., an injective function expressing cluster-membership)  :  → N+ that minimizes
∑︁
−
+
∑︁
      </p>
      <p>+.
(,)∈,()=()
(,)∈,()̸=()
(1)</p>
      <p>For the sake of presentation, we assume + = − = 0, for all  ∈/ , and non-trivial Min-CC
instances, i.e., + ̸= − , for some  ∈ .</p>
      <p>
        Min-CC is NP-hard [
        <xref ref-type="bibr" rid="ref1 ref5">1, 5</xref>
        ] yet dificult to approximate, being it APX-hard even for
complete graphs and edge weights (+, − ) ∈ {(0, 1), (1, 0)}, ∀ ∈  [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. For general (i.e., not
necessarily complete) graphs and unconstrained weights, the best known approximation factor
is (log | |) [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. This factor improves if restrictions on edge weights are imposed. The one
that has received remarkable attention is the probability constraint (pc): A Min-CC instance
is said to satisfy the probability constraint if + + − = 1, for all pairs ,  ∈  .
      </p>
      <p>
        A Min-CC instance obeying the pc necessarily corresponds to a complete graph (otherwise,
any missing edge would violate the pc). Under the pc, Min-CC admits constant-factor guarantees.
The best known approximation factor is 4, achievable – as shown in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] – by Charikar et al.’s
algorithm [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. That algorithm is based on rounding the solution to a large linear program (with
a number Ω(| |3) of constraints), thus being feasible only on small graphs.
Algorithm 2 GlobalCC
Input: Graph  = (, ); nonnegative weights +, − , ∀ ∈ , satisfying Theorem 1; algorithm A
achieving  -approximation guarantee for Min-pc-CC
Output: Clustering  of 
1: choose ,  s.t.  ∈ [Δ, + + − ] {Theorem 1}
2: compute   =  (+ + −) −  , ∀,  ∈ 
3: compute  ± = 1 (︀   ± −  2 )︀ , ∀,  ∈  (using ,  defined in Step 1)
4:  ← run A on Min-pc-CC instance ⟨′ = (,  ×  ), { +,  − }∈ ×  ⟩
      </p>
      <p>
        Here, we are particularly interested in the Pivot algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], due to its theoretical properties
– it achieves a factor-5 expected guarantee for Min-CC under the pc – and practical benefits – it
takes (||) time, and is easy-to-implement. Pivot simply picks a random vertex , builds a
cluster as composed of  and all the vertices  such that an edge with + &gt; − exists, and
removes that cluster. The process is repeated until the graph has become empty (Algorithm 1).
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Correlation Clustering with global weight bounds</title>
      <p>The weight bounds that have been so far studied are local bounds, i.e., constraints that are
required to hold for every object pair in isolation. In this work, we are the first to consider global
weight bounds in min-disagreement correlation clustering. We derive bounds on edge weights’
aggregate functions that are suficient to lead to proved quality guarantees. More specifically,
for a Min-CC instance ⟨ = (, ), {+, − }∈⟩ we define:
+ = ︁( |2 | )︁ − 1 ∑︀∈ +,
− = ︁( |2 | )︁ − 1 ∑︀∈ − ,
Δ = max∈ |+ − − |</p>
      <p>The main theoretical result of this work is described by the following theorem.
Theorem 1. If the condition + + − ≥ Δ holds for a Min-CC instance , then it is
possible to construct a Min-CC instance ′ (in linear time and space) such that () the probability
constraint holds on ′, and () an  -approximate clustering on ′ (i.e., a clustering whose
objectivefunction value is no more than  times ′’s optimum) is an  -approximate clustering on  too.</p>
      <p>Let Min-pc-CC denote the version of Min-CC operating on instances that satisfy the pc.
According to Theorem 1, if + + − ≥ Δ holds for a Min-CC instance, then any
 -approximation algorithm A for Min-pc-CC can be employed – as a black box – to get an
 -approximate solution to that Min-CC instance. The algorithm for doing so is simple: given
a Min-CC instance , get a Min-pc-CC instance via strict approximation-preserving (sap)
reduction, and run the black-box algorithm A on it (Algorithm 2). Being the reduction sap,
Algorithm 2 on input ⟨, A⟩ achieves factor- guarantee on .</p>
      <p>
        A noteworthy consequence of this result is that, if a Min-CC instance  satisfies our condition,
then the Pivot algorithm can be used to get (in linear time and space) a clustering achieving
a 5-approximation guarantee on .1 This corresponds to extending the range of validity of
1A probability-constraint-compliant Min-CC instance ′ is derivable from  in linear time and space (cf. Th. 1 ()). Pivot on
′ yields a 5-approximate clustering [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A 5-approximate clustering on ′ is a 5-approximate clustering on  (cf. Th. 1 ()).
Pivot’s guarantee beyond the probability constraint: our global-weight-bounds condition now
sufices for the 5-approximation to hold. A key advantage is that our condition is more likely to
be satisfied than the probability constraint; for instance, it may happen that a bunch of edges
are missing in the graph (thus violating the probability constraint), but, if our condition holds,
one can get a 5-approximate clustering with Pivot. We point out that our result is general and
holds for any Min-CC algorithm achieving approximation guarantees under the probability
constraint. However, the contextualization to the Pivot algorithm is relevant and worth to be
exploited, since Pivot achieves the best tradeof between quality guarantees and eficiency.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Analysis of the global-weight-bounds condition</title>
      <p>
        Our result can be exploited to quickly yet easily recognize whether employing
probabilityconstraint-aware approximation algorithms is a worth choice even if the probability constraint
is not met. As an example, consider a graph that violates the probability constraint. So far,
that graph would have likely been handled with linear-programming (LP) algorithms [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], as
they achieve (factor-(log | |)) approximation guarantees on general graphs/weights (whereas
algorithms like Pivot are just heuristics if the probability constraint does not hold). Instead,
our condition can be used as an indicator of whether Pivot can still achieve guarantees even if
the probability constraint is violated, thus being preferred over the less eficient LP algorithms.
Our experiments confirmed this theoretical finding, i.e. a better fulfilment of our condition
corresponds to better performance of Pivot with respect to the LP algorithms, and vice versa.
Settings. We selected four real-world graphs, namely Karate, Dolphins, Adjnoun, and Football.2
Note that the small size of such graphs is not an issue because this evaluation stage involves,
among others, LP correlation-clustering algorithms, whose Ω(| |3) time complexity makes them
unafordable for graphs larger than that. We augmented these graphs with artificially-generated
edge weights, to test diferent levels of fulfilment of our global-weight-bounds condition stated in
Theorem 1. We controlled the degree of compliance of the condition by a target ratio parameter,
defined as  = Δ/(+ + − ). The condition is satisfied if and only if  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], and
smaller target-ratio values correspond to better fulfilment of the condition, and vice versa.
      </p>
      <p>Given a desired target ratio, edge weights are generated as follows. First, all weights are
drawn uniformly at random from a desired [, ] range. Then, the weights are adjusted in
a two-step iterative fashion, until the desired target ratio is achieved: () the maximum gap
Δ is fixed, the weights are changed for pairs that do not contribute to Δ so as to reflect
a change in +, − ; () +, − are fixed, Δ is updated by randomly modifying
pairs that contribute to Δ. Finally, weight pairs are randomly assigned to the edges.</p>
      <p>
        We compared the performance of Pivot (Algorithm 1 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) to one of the state-of-the-art
algorithms achieving factor-(log | |) guarantee on general graphs/ weights [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We dub the
latter LP+R, alluding to the fact that it rounds the solution of a linear program. We evaluated
correlation-clustering objective, number of output clusters, and runtimes of these algorithms.
Results. Figure 1 shows the quality (i.e., Min-CC objective) of the produced clusterings, with the
bottom-left insets reporting the ratio between the performance of Pivot and LP+R. Results refer
to target ratios  varied from [
        <xref ref-type="bibr" rid="ref3">0, 3</xref>
        ], with stepsize 0.1, and weights generated with  = 0,  = 1.
      </p>
      <sec id="sec-4-1">
        <title>2Publicly available at http://konect.cc/networks/</title>
        <p>For each target ratio, all measurements correspond to averages over 10 weight-generation runs,
and each of such runs corresponds to averages over 50 runs of the tested algorithms.</p>
        <p>
          The main goal here is to have experimental evidence that a better fulfilment of our global
condition leads to Pivot’s performance closer to LP+R’s one, and vice versa. Figure 1 confirms
the above: in all datasets, Pivot performs more closely to LP+R as the target ratio gets smaller.
In general, Pivot performs similarly to LP+R for  ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ], while being outperformed for  &gt; 1.
This conforms with the theory: on these small graphs, factor-5 Pivot’s approximation is close
to factor-(log | |) LP+R’s approximation. Pivot achieves the best performance on Football,
where it outperforms LP+R even if the condition is not met. This is motivated by Football’s
higher clustering coeficient and average degree, which help Pivot sample vertices (and, thus,
build clusters) in dense regions of the graph. This is confirmed by the number of clusters
(Table 1-(left)): Pivot yields more clusters than LP+R on all datasets but Football.
        </p>
        <p>Concerning execution times, Pivot runs in less than one second, and as expected, it is extremely
faster than LP+R, whose runtimes are about 2 seconds in Karate, 37 in Dolphins, and above 770
in Adjnoun and Football.3 The ineficiency of LP+R further emphasizes the importance of our
result in extending the applicability of faster algorithms like Pivot.</p>
        <p>We complement this stage of evaluation by testing diferent graph densities, and for target
ratios  = 1 (borderline satisfaction of our condition) and  = 20 (far fulfilment of the condition).
Again, the results (not shown) meet the expectations: in terms of clustering quality, Pivot
performs closely to or better than LP+R for  = 1, while the opposite happens for  = 20.
Denser graphs correspond to better Pivot performance. This is again explained since higher
densities favor better Pivot’s random choices. Runtimes are not afected by graph density. This
is expected as well, as LP+R runtimes are dominated by the time spent in building and solving
the linear program, which depends on the number of vertices only, whereas variations in the
runtimes of Pivot cannot be observed due to the small size of the datasets at hand.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Application to fair clustering</title>
      <p>An important exploitation of our theoretical results concerns the selection of features that lead
edge weights to express the best tradeof between an accurate representation of the objects and
the suitability of the correlation-clustering weights to ensure approximation guarantees. Our
global-weight-bounds condition can be an efective way to the achievement of this tradeof, and
it can be fulfilled more easily than local weight bounds (e.g., in case of probability constraint, it</p>
      <sec id="sec-5-1">
        <title>3Experiments were carried out on the Cresco6 cluster (https://www.eneagrid.enea.it)</title>
        <p>Let  be a set of objects defined over a set of attributes , which is assumed to be partitioned
into two sets:  , which contains fairness-aware or sensitive attributes (e.g., gender, race,
religion), and ¬ , which includes the remaining, non-sensitive attributes. We also assume that
part of the attributes might be numerical, and the others as categorical; we will use superscripts
 and  to distinguish the two types, therefore 
 =  ∪  and ¬ = ¬

∪ ¬ .</p>
        <p>We consider a twofold fair-clustering objective: cluster the objects such that () the
intracluster similarity and the inter-cluster similarity are maximized and minimized, respectively,
according to the non-sensitive attributes; () the intra-cluster similarity and the inter-cluster
similarity are minimized and maximized, respectively, according to the sensitive attributes.
Pursuing this second objective would help distribute similar objects (in terms of sensitive
attributes) across diferent clusters, thus helping the formation of diverse clusters. This is
beneficial to ensure that the distribution of groups defined on sensitive attributes within each
cluster approximates the distribution across the dataset.</p>
        <p>The task of fair clustering can be mapped to a Min-CC instance where the positive-type and
negative-type weights, respectively, can be defined as follows:</p>
        <p>︁(
+ :=  +  ¬ · 
¬ (, ) + (1 −  ¬ ) ·</p>
        <p>¬ (, )
︁(
− :=  −   ·</p>
        <p>(, ) + (1 −   ) · 


 (, )
︁)
︁)
(2)
(3)
|¬ |) −
1) and  −</p>
        <p>= (|¬ |/(| | + |¬ |) −
similarities proportionally to the size of the involved set of attributes,  + = (|
where   = | |/(| |+| |) and  ¬ = |¬ |/(|¬ |+|¬ |) are coeficients to weight
 |/(|</p>
        <p>| +</p>
      </sec>
      <sec id="sec-5-2">
        <title>1) are smoothing factors to penalize</title>
        <p>correlation-clustering weights that are computed on a small number of attributes (which is
any object similarity function defined over the subspace  of the attribute set.
usually the case for sensitive attributes, and hence negative-type weights), and  (· ) denotes
Problem 2 (Attribute Selection for Fair Clustering). Given a set of objects  over the
attribute sets  , ¬ , find maximal subsets  ⊆ 
 and ¬
⊆ 
¬ , with | | ≥
1, |¬ | ≥
1, s.t. the weights computed by Eqs. (2)–(3) satisfy the global-weight-bounds condition in Th. 1.
Heuristics. Our first proposal to solve Problem 2 is a greedy heuristic, dubbed</p>
      </sec>
      <sec id="sec-5-3">
        <title>Greedy, which</title>
        <p>iteratively removes the attribute that leads to the correlation-clustering weights with the lowest
target ratio until our global condition is satisfied. This algorithm runs in
(| |2||2) time
since, at each iteration, for each candidate attribute to be removed (| |2) similarities are
computed to quantify the decrease of the target ratio. We also devised other heuristics which,
like Greedy, remove one attribute at time, but exploit some easy-to-compute proxy measures to
select the attribute that avoid the pairwise similarity computation for each candidate attribute.
The Hlv (resp. Hmv) heuristic removes the least (resp. most) variable attribute where the
variability is measured through normalized entropy for categorical attributes and with variation
coeficient (capped to 1 if above 1) for numerical features. Hlv_B and Hmv_B, like the previous
two heuristics, remove the least and most variable attribute, respectively, but the selection is
constrained to the biggest set of features among  and ¬ , in order to try to balance their
size. Hlv_BW removes the least variable attribute from the set ( or ¬ ) which induces
the highest average similarity value using the current weights, whereas Hmv_SW selects the
most variable attribute from the set which induces the lowest average similarity value using the
current weights. Note that all these heuristics (but Greedy) run in (| |2||) time.
Data and results. We considered 4 relational datasets: CreditCardCustomers, Adult, Bank, and
Student.4 For each of them, Table 1 shows the number of objects, a pair of values corresponding
to the count of non-sensitive and sensitive attributes, and a description of the latter.</p>
        <p>
          Table 2 summarizes results achieved by each of the above heuristics, on the various datasets,
according to the following criteria (columns from left to right): number of iterations at
convergence, target ratio, percentage of pairs ,  having + &gt; −; also, computed w.r.t. the
full attribute space are: value of the objective function, average Euclidean fairness [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], average
number of clusters, intra-cluster and inter-cluster similarities according to either the subset
of sensitive attributes or the subset of non-sensitive attributes, and running time. Euclidean
and Jaccard similarity functions are used for numerical and categorical attributes, resp., and the
overall similarity is obtained by linear combination analogously to Eqs. (2)–(3). Note that higher
values correspond to better performance for  -based intra-cluster and ¬ -based inter-cluster
similarities, while the opposite holds for the other two measures and the Euclidean fairness. The
ifrst row in each table refers to the initial, full-attribute-space status of the relational network,
as a baseline, whereby the global-weight-bounds condition is not satisfied.
        </p>
        <p>Hlv_BW and Hlv_B tend to produce solutions that correspond to the highest (i.e., worst)
value of the objective function and of the clustering size; this should be ascribed to the fact
4https://www.kaggle.com/sakshigoyal7/credit-card-customers; https://archive.ics.uci.edu/ml/index.php
that both heuristics favor the selection of the least variable attributes. By contrast, Hmv_SW is
the best performing in terms of objective function and Euclidean fairness. This method also
tends to produce very few clusters. Note that, while a higher number of clusters is found to be
coupled with a worsening of the objective function, the opposite does not hold in general; also,
in contrast to the intuition that a higher percentage of pairs having + &gt; − should favor the
grouping into fewer clusters, we observed that the clustering sizes are not necessarily ordered
as with the percentage ordering. As far as eficiency, Hmv_B mostly provides the best time
performance. While being one order of magnitude slower, Greedy tends to converge in less
iterations, as it indeed removes fewer attributes than the other methods; we found that in some
cases (e.g., Student, Adult, results not shown), this allows Greedy for compensating its expected
higher cost per iteration. Notably, each method lowers the initial target ratio below 1 so as to
satisfy the global condition, and the per-dataset best-performing method (not shown) improves
all intra-/inter-cluster similarities and Euclidean fairness w.r.t. the baseline.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>We discussed a novel perspective in correlation clustering, which considers global weight
bounds. A suficient condition is defined to extend the range of validity of approximation
guarantees beyond local weight bounds, such as the probability constraint. Experimental results,
including a case study in fair clustering, put in evidence of the usefulness of our approach.</p>
      <p>
        One interesting future direction we plan to investigate is to define the edge weights (bounds)
based on relational properties of the objects under an uncertain data modeling framework [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Blum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chawla</surname>
          </string-name>
          , Correlation clustering,
          <source>Mach. Learn</source>
          .
          <volume>56</volume>
          (
          <year>2004</year>
          )
          <fpage>89</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>García-Soriano</surname>
          </string-name>
          , E. Liberty,
          <article-title>Correlation clustering: from theory to practice</article-title>
          ,
          <source>in: Proc. ACM KDD Conf.</source>
          ,
          <year>2014</year>
          , p.
          <year>1972</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mandaglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tagarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gullo</surname>
          </string-name>
          ,
          <article-title>Correlation Clustering with Global Weight Bounds</article-title>
          ,
          <source>in: Proc. ECML-PKDD</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>499</fpage>
          -
          <lpage>515</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -86520-7\_
          <fpage>31</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ailon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Charikar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Newman</surname>
          </string-name>
          ,
          <article-title>Aggregating inconsistent information: Ranking and clustering</article-title>
          ,
          <source>JACM</source>
          <volume>55</volume>
          (
          <year>2008</year>
          )
          <volume>23</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          :
          <fpage>27</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Shamir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tsur</surname>
          </string-name>
          ,
          <article-title>Cluster graph modification problems</article-title>
          , Discret. Appl. Math.
          <volume>144</volume>
          (
          <year>2004</year>
          )
          <fpage>173</fpage>
          -
          <lpage>182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Charikar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Guruswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wirth</surname>
          </string-name>
          ,
          <article-title>Clustering with qualitative information</article-title>
          ,
          <source>JCSS</source>
          <volume>71</volume>
          (
          <year>2005</year>
          )
          <fpage>360</fpage>
          -
          <lpage>383</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Demaine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Emanuel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fiat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Immorlica</surname>
          </string-name>
          ,
          <article-title>Correlation clustering in general weighted graphs</article-title>
          ,
          <source>TCS</source>
          <volume>361</volume>
          (
          <year>2006</year>
          )
          <fpage>172</fpage>
          -
          <lpage>187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Puleo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Milenkovic</surname>
          </string-name>
          ,
          <article-title>Correlation clustering with constrained cluster sizes and extended weights bounds</article-title>
          ,
          <source>SIAM J. Optim</source>
          .
          <volume>25</volume>
          (
          <year>2015</year>
          )
          <fpage>1857</fpage>
          -
          <lpage>1872</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Abraham</surname>
          </string-name>
          , D. P,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Sundaram</surname>
          </string-name>
          ,
          <article-title>Fairness in clustering with multiple sensitive attributes</article-title>
          ,
          <source>in: Proc. EDBT Conf</source>
          .,
          <year>2020</year>
          , pp.
          <fpage>287</fpage>
          -
          <lpage>298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gullo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ponti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tagarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Greco</surname>
          </string-name>
          ,
          <article-title>An information-theoretic approach to hierarchical clustering of uncertain data</article-title>
          ,
          <source>Inf. Sci</source>
          .
          <volume>402</volume>
          (
          <year>2017</year>
          )
          <fpage>199</fpage>
          -
          <lpage>215</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.ins.
          <year>2017</year>
          .
          <volume>03</volume>
          . 030.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>