<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Correlation Clustering with Fairness Constraints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesco Gullo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucio La Cava</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domenico Mandaglio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Tagarelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. Computer Engineering</institution>
          ,
          <addr-line>Modeling, Electronics, and Systems Engineering (DIMES)</addr-line>
          ,
          <institution>University of Calabria</institution>
          ,
          <addr-line>Rende (CS)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Information Engineering</institution>
          ,
          <addr-line>Computer Science, and Mathematics (DISIM)</addr-line>
          ,
          <institution>University of L'Aquila</institution>
          ,
          <addr-line>Coppito (AQ)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Fairness in data analysis is a well-established and growing area of research, ofering tools to understand and mitigate specific forms of bias in decision-making systems. One of the key challenges in this context is fair clustering, i.e., grouping data objects that are similar according to a common feature space, while avoiding biasing the clusters against or towards particular types of classes or sensitive features. In this work, we discuss a correlation-clustering method we recently introduced and analyze its performance in fairness-aware scenarios. We compare it with representative state-of-the-art fair clustering approaches, considering both standard clustering quality metrics and fairness-related measures. Empirical results on publicly available datasets has shown that the method produces clusterings of higher quality according to traditional validation criteria, while also accounting for fairness considerations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;clustering</kwd>
        <kwd>fairness</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>We live in an era in which machine learning is increasingly pervasive in our society. Every day, we</title>
        <p>interact with machine learning systems—often without being aware of it—and these systems are gaining
growing decision-making power in various aspects of our lives. For instance, they are used to support,
or even replace, human decision makers in financial, medical, and legal domains.</p>
        <p>
          Given their critical role, machine learning systems should ensure reliable and fair behavior, avoiding
discrimination against those who are subject to their decisions. However, a major challenge arises
from the data on which these systems are trained: such data are often intrinsically biased, typically
due to flawed or unbalanced data collection processes. It is therefore important to prevent machine
learning algorithms from being influenced by, or amplifying, these biases. For example, the work in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
addresses this issue through the notion of disparate impact, which refers to the principle that no group
of individuals should be (even indirectly) disadvantaged by the decisions of an automated system.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>In this work, we focus on the problem of fair clustering, an unsupervised machine learning task whose</title>
        <p>goal is twofold: () as in standard clustering, similar objects are assigned to the same cluster, whereas
dissimilar objects are assigned to diferent clusters; and ( ) the resulting clusters are not dominated by
a specific type of sensitive data class (e.g., individuals sharing the same gender).</p>
      </sec>
      <sec id="sec-1-3">
        <title>Our key assumption is that the fair clustering problem can be efectively addressed within the framework of correlation clustering [2, 3]. This well-established approach partitions the vertices of a graph into clusters by maximizing intra-cluster similarity and minimizing inter-cluster similarity, based on pairwise weights that encode positive or negative co-association.</title>
        <p>
          Contributions. In this paper, we discuss the use of correlation clustering for fair clustering, with
a focus on our recent algorithm [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], and compare it to a selection of state-of-the-art methods from
diverse algorithmic paradigms. Our analysis explores how emphasizing fairness objectives can influence
the clustering quality, as measured by classic clustering-validation criteria. Our findings show that
our proposed algorithm yields higher-quality solutions than competing methods from a clustering
perspective, while also accounting for fairness-related aspects.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Although of relatively recent definition, the problem of fairness in clustering has received considerable
attention in the literature [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. With their seminal work, Chierichetti et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] were among the first to
formalize the notions around fair clustering and the related problem, following the disparate-impact
doctrine [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Their main contribution is a general pre-processing step, i.e., fairlets decomposition, to
enable traditional algorithms (e.g., -center and -median) meeting fairness principles. Following
that forerunner work, fairness has become pervasive in the clustering landscape [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], leading to
a fairness-aware declination of numerous traditional clustering formulations, such as -center [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
-means [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], -median [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], spectral clustering [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and hierarchical clustering [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        The phenomenon of fairness in clustering has also been extended to alternative approaches, such as
correlation clustering. In this regard, Ahmadian et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] is the first work to leverage the correlation
clustering model for the fair clustering task. More specifically, it takes a complete and undirected graph
as input, where vertices are assigned a (single) label representing a given protected class attribute
(e.g., sex or ethnicity), and the goal is to provide a fair representation of each considered label in
the resulting clusters. Recently, Mandaglio et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] proposed to model the fair clustering problem
of a relational dataset as a correlation clustering instance. Given a set of objects, defined over a set
of features, Mandaglio et al. build an associated correlation clustering instance by considering the
similarity between the tuples. Although Ahmadian et al.’s and Mandaglio et al.’s approaches aim to
cluster diferent types of data (graphs and tuples, respectively), both approaches reduce the original
problem to a correlation clustering instance. However, Mandaglio et al.’s formulation is more general
than Ahmadian et al.’s one, since the former deals with an arbitrary number of labels (or sensitive
attributes), while the latter is limited to a single-label setting.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminaries</title>
      <p>
        Correlation clustering. The correlation clustering problem, originally introduced by Bansal et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
consists of clustering the set of vertices of a graph whose edges are assigned two nonnegative weights,
named positive-type and negative-type weights, respectively. Such weights express the advantage
of putting any two connected vertices into the same cluster (positive-type weight) or into separate
clusters (negative-type weight). The objective is to partition the vertices so as to minimize the sum of
the negative-type weights between vertices within the same cluster plus the sum of the positive-type
weights between vertices in separate clusters (Min-CC).
      </p>
      <p>
        Problem 1 (Min-CC [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]). Given an undirected graph  = (, ), with vertex set  and edge set
 ⊆  ×  , and weights +, − ∈ R0+ for all edges (, ) ∈ , find a clustering  :  →− N+ that
minimizes:
      </p>
      <p>∑︁
(,)∈, ()=()
−
+</p>
      <p>∑︁
(,)∈, ()̸=()
+.</p>
      <p>(1)</p>
      <p>
        Min-CC is APX-hard [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], but admits approximation algorithms [
        <xref ref-type="bibr" rid="ref16 ref17 ref2">16, 2, 17</xref>
        ] with guarantees
depending on the type of input graph. On general graphs and weights, the best known approximation
factor is (log | |) [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ], provided by a linear programming approach. Conversely, constant-factor
approximation algorithms are possible if the graph is complete and edge weights satisfy the probability
constraint, i.e., + + − = 1 for all ,  ∈  . Among these, the one which provides the best trade-of
between eficiency and theoretical guarantees is the Pivot algorithm [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which simply picks a random
vertex , builds a cluster as composed of  and all the vertices  such that an edge with + &gt; − exists,
and removes that cluster from the graph. The process is repeated until the graph has become empty.
      </p>
      <sec id="sec-3-1">
        <title>This algorithm has (||) time complexity and it achieves a factor-5 expected guarantee for Min-CC</title>
        <p>
          under the probability constraint or if a global weight bound holds on the overall edge weights [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <sec id="sec-3-1-1">
          <title>We next discuss how to solve the fair clustering problem via a Min-CC approach.</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Input: Set of objects  , sensitive attributes  , non-sensitive attributes ¬ , Min-CC algorithm A</title>
        <p>¬ (, ),  (, ), ∀,  ∈  , as in Eqs. (3)–(4)
2: build the instance  = ⟨ = ( ,  ×  ), {
¬ (, ),  (, )},∈ × ⟩</p>
        <p>1: compute 
Output: Clustering  of 
3:  ←</p>
        <p>run A on</p>
      </sec>
      <sec id="sec-3-3">
        <title>The latter is assumed to be divided into two sets,</title>
        <p>and ¬ . The 
Problem statement. Let  = {1, · · ·
, } be a set  of  objects defined over a set of attributes.</p>
        <p>set contains fairness-aware, or
sensitive, attributes such as those identifying sex, race, religion, relationship status in a citizen database
and any other attribute over which fairness is to be ensured. ¬ denotes the attributes that are
relevant to the task of interest, and thus can be regarded as non-sensitive. In both cases, we assume that
part of the attributes might be numerical, and the others as categorical (binary or multi-value). We use
subscripts  and  to distinguish the two types, therefore 
 =  ∪  and ¬ = ¬

∪ ¬ .</p>
        <p>
          We consider a clustering task whose goal is to partition the input objects with a twofold objective: ()
minimize the inter-cluster similarity according to the non-sensitive attributes ¬ ; () minimize the
intra-cluster similarity according to the sensitive attributes  . The former objective corresponds to
the typical clustering objective, since dissimilar objects should belong to diferent clusters. Pursuing the
second objective, instead, would help distribute objects that are similar in terms of sensitive attributes
across diferent clusters, thus fostering the formation of clusters that are equally represented in terms of
the sensitive attributes. This is beneficial to ensure that the distribution of groups defined on sensitive
attributes within each cluster approximates the distribution across the dataset. Formally, the problem
we tackle in this work is:
minimize:
Problem 2 (Fair-CC [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]). Given a set of objects  , two subsets of attributes 
object similarity function  (· ) defined over the subspace  of the attribute set, find a clustering * to
 and ¬ , and an
∑︁
,∈ , ()=()
 (, )
+
        </p>
        <p>¬ (, )
∑︁
,∈ , ()̸=()</p>
        <p>The objective in Eq. (2) corresponds to solving a complete Min-CC instance where the set of
vertices corresponds to the objects in  and, for each pair of vertices  and , the positive-type (resp.
negative-type) correlation-clustering weight corresponds to the similarity score between the two vertices
according to the non-sensitive (resp. sensitive) attributes.
4. Algorithm
(2)
(3)
(4)
1) and  −
similarities proportionally to the number of involved attributes, and  + = (|
 |/(|
where   = | |/(| | + | |) and  ¬ = |¬ |/(|¬ | + |¬ |) are coeficients to weight

 | + |¬ |) −
= (|¬ |/(| | + |¬ |) − 1) are smoothing factors to penalize correlation-clustering
weights that are computed on a small number of attributes. The latter is reasonable as, in a fair clustering
task, we usually have fewer sensitive attributes, and it should be avoided that negative-like weights can
dominate the positive-like ones. The exponential function enables a mild smoothing, which is desirable.
The Fair-CC problem requires a similarity function over a set of attributes. We quantify the similarity
between objects  and , based on sensitive and non-sensitive attributes, by means of the following

¬ (, ) and  (, ) measures, respectively:
︁(</p>
        <p>︁(

¬ (, ) :=  +  ¬ · 
¬ (, ) + (1 −  ¬ ) · 
¬ (, ) ,
 (, ) :=  −   · 

 (, ) + (1 −   ) · 

 (, ) ,
︁)
︁)</p>
        <sec id="sec-3-3-1">
          <title>As Fair-CC is an instance of Min-CC, it can be solved by Min-CC algorithms. Our proposed</title>
          <p>
            algorithm [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ], dubbed CCBounds 1 and presented in Algorithm 1, consists of building a Min-CC
instance with vertices as the input data objects and edge weights as the similarity scores, and then
running a Min-CC algorithm A on such a Min-CC instance.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>Theoretical remarks. Let A( ) be the running time of the algorithm A on the set of data objects  .</title>
        <p>
          CCBounds runs in (| |2|| + A( )) time complexity since it needs to compute a similarity score,
over  attributes, for each pair of objects in  , and then solve the resulting Min-CC instance through
algorithm A. Also, the space complexity of CCBounds is (| |2) for storing the similarity scores in
memory. The specific Min-CC algorithm A used in CCBounds is the one proposed in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], since it
provides (under the probability constraint or the global weight bound stated in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]) constant-factor
approximation guarantee in expectation. Also, taking linear time in the size of the input graph, to the
best of our knowledge, it is the most eficient algorithm in the Min-CC literature. As a result of this
choice, the time complexity of CCBounds becomes (| |2||).
        </p>
        <sec id="sec-3-4-1">
          <title>Another appealing aspect of the fact that Fair-CC is an instance of Min-CC is that Fair-CC inherits the following theoretical result:</title>
          <p>
            Theorem 1 ([
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]). If the condition ︀( |2 |)︀ − 1 ∑︀,∈ (¬ (, ) +  (, )) ≥
max,∈ |¬ (, ) −  (, )| holds on the similarity scores and the oracle A is an
 -approximation algorithm for Min-CC, CCBounds is an  -approximation algorithm for Fair-CC.
          </p>
        </sec>
        <sec id="sec-3-4-2">
          <title>The above theorem provides approximation guarantee on the Fair-CC objective (cf. Eq. (2)), which combines the cluster quality measure (first summation) and the fairness-related objective (second summation). It is not known how this quality guarantee translates into the single objective, e.g., the fair objective. This is a challenging open question which we defer to future studies.</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Fairness Evaluation</title>
      <sec id="sec-4-1">
        <title>In this section, we summarize the most-commonly adopted metrics for the evaluation of fairness aspects</title>
        <p>
          in clustering. We focus on algorithm-independent measures, i.e., able to generalize across multiple
methods, following a group-level approach under the disparate impact doctrine [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
Balance. It is one of the most adopted evaluation metrics for fairness in clustering, initially proposed
by Chierichetti et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] in a context with one sensitive attribute with two protected groups. It has been
successively generalized to  protected groups by Bera et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. According to the latter, the balance of
a clustering solution can formally be defined as follows [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]:
() =
        </p>
        <p>min
∈,∈[]</p>
        <p>
          1 }︁
min {︁,, ,
∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ],
(5)
where , is the ratio between the proportion of the objects belonging to a given protected group  in
the considered dataset and in a given cluster  ∈ .
        </p>
        <p>
          In such a formulation, the lower and upper bounds of a cluster indicate the fully unbalanced and
perfectly balanced scenarios, respectively, where the former indicates the case where all the objects in
such a cluster pertain to the same protected group, whereas the latter denotes an equal number of objects
from each of the protected groups. Therefore, the higher the balance, the better the obtained solution,
in terms of equality. Additionally, the considered generalization allows us to obtain a comprehensive
evaluation of the balance of our clustering solutions, as it looks at the dataset context, i.e., it will return
high scores provided that the balances of the clustering and the input dataset are comparable.
Average Euclidean Fairness. This metric was introduced by Abraham et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] to estimate the
unfairness by assessing the deviation between the representation of groups obtained focusing on the
sensitive attributes in the whole dataset and the given clustering solution. It expresses the cluster-size
1https://github.com/Ralyhu/globalCC
weighted average of cluster-level deviations (i.e., Euclidean distances) between two frequency (sensitive)
attribute vectors, namely , which is computed over the entire set of objects, and , which is
computed for each cluster  ∈ , focusing on a sensitive attribute  ∈  . Formally, it is defined as:
() =
∑︀∈ |∑|︀×∈ |(|, ) ,
(6)
where  represents the Euclidean distance between the frequency attribute vectors. Since  can be
multi-valued, such a formulation is suited to scenarios where there are multiple protected groups. Also,
as this measure is a deviation, smaller values correspond to better solutions.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Experimental Methodology</title>
      <p>6.1. Competing Methods</p>
      <sec id="sec-5-1">
        <title>In the following, we briefly overview the competing methods we included in our experiments. For each of those methods, we used publicly available code, which we adopted “as-is”, i.e., without making any changes or optimizations.</title>
        <p>
          Fair Clustering Through Fairlets [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This method, here dubbed Fairlets, is one of the pillars of
fair clustering. It is based on the notion of fairlets decomposition, that is a grouping of the input objects
into fairlets, i.e., minimal subsets of objects that satisfy a given fairness definition, while preserving the
clustering objective. Given a good fairlets decomposition, this approach requires traditional clustering
algorithms (i.e., -center or -median) applied on the centers of the obtained fairlets, to yield the “fair”
solutions. Fairlets supports two types of fairlets decomposition: an accurate one based on min cost flow
(MCF), and a more eficient one. We hereinafter refer to those decompositions as MCF decomposition
and vanilla decomposition, respectively. A major limitation of Fairlets is that it can handle a single
sensitive binary attribute only. We will discuss the impact of such limitations in more detail in Section 7.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>HST-based Fair Clustering [11]. This approach, here dubbed HST-FC, focuses on the -median</title>
        <p>formulation, and employs a quad-tree decomposition to embed the objects in a a tree metric, called HST.</p>
      </sec>
      <sec id="sec-5-3">
        <title>By leveraging such a tree, HST-FC computes an approximate fairlets decomposition. A fair clustering is</title>
        <p>ultimately obtained by running -median algorithms on the produced fairlets. Like Fairlets, HST-FC
sufers from the limitation that it deals with one binary sensitive attribute only.</p>
        <p>
          Fair Correlation Clustering [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. This method, here dubbed Signed, introduces a fairlet-based
reduction for the graph clustering scenario with respect to the problem of correlation clustering, leading
to the concept of correlation clustering with fairness constraints. Specifically, given a signed graph,
i.e., an undirected graph with edges labeled as positive or negative, the algorithm performs a fairlet
decomposition (under diferent fair settings) over the set of vertices. The produced decomposition
is used, together with the original graph, to build a reduced (complete and unweighted) correlation
clustering instance, where the vertices correspond to the produced fairlets and the sign of the edges
between any two fairlets are built according to the majority sign of the edges between vertices within
those two fairlets. A clustering on this reduced correlation clustering instance is computed through
local-search optimization starting from all singleton clusters, and then expanded into a solution of the
original problem. As a fair setting for the fairlets decomposition, we consider the most common case of
fair decomposition where clusters are required not to have a sensitive data class. As the Signed method
requires a signed graph as input, we perform the following preprocessing step to make the relational data
compatible with this format. We derive a complete graph whose vertices are the original data objects and
an edge (, ) is labeled as positive with probability + = {0, ¬ (, ) −  (, )} and
as a negative edge with probability 1− +, where the similarity functions are the ones defined in Eqs. (3)–
(4). We point out that, although we can adapt the same weighting strategy as CCBounds to obtain the
edge attributes, we discarded this choice as our experiments showed that it favors the emergence of a
degenerated clustering solution (i.e., a single output cluster), due to the strong predominance of positive
weights on the edges.
6.2. Data
        </p>
      </sec>
      <sec id="sec-5-4">
        <title>We considered five real-world relational datasets, which have been commonly used in the fair clustering</title>
        <p>literature. The main characteristics of these datasets are summarized in Table 1. As reported in the
table, in our evaluation we focused on a smaller subset of the original attributes; note that this is a
common practice, which is adopted, among others, by the competing methods outlined above.
Adult2 reports information about the 1994 US Census. For each tuple representing an individual, we
considered age, fnlwgt, education-num, capital-gain and hours-per-week as non-sensitive attributes, and
sex (i.e., male or female) as a sensitive attribute.</p>
        <p>Bank2 provides details on phone calls involving direct marketing campaigns of a Portuguese banking
institution to assess whether the bank term deposit will be subscribed or not. We considered attributes
age, balance and duration as non-sensitive, and marital status (i.e., married or not) as sensitive.
CreditCard3 concerns customer credit card services to estimate customer attrition. We considered
attributes customer_age, dependent_count, avg_utilization_ratio and total_relation ship_count as
nonsensitive, and sex as sensitive.</p>
        <p>Diabetes2 reports diabetic patient records, for which we considered age and time_in_hospital as
nonsensitive attributes, and sex as a sensitive attribute.</p>
        <p>Student2 contains student performances for Mathematics and Portuguese language in secondary
education of two Portuguese schools. We considered age, study_time and absences as non-sensitive, and sex
as sensitive.
6.3. Evaluation Goals</p>
      </sec>
      <sec id="sec-5-5">
        <title>Our evaluation objectives concern both fairness and quality aspects of clustering. In the first case, we</title>
        <p>use the fairness metrics defined in Section 5, which allow us to have a group-wide overview of how
a method behaves in terms of fair principles. In the second case, we assess the quality of clustering
by means of intra- and inter-clustering similarity, considering both the sensitive and non-sensitive
attributes, as described below. Finally, we evaluate running times.</p>
        <p>Intra/Inter-cluster similarity. As stated in Section 3, we take into account the intra-cluster, resp.
inter-cluster, similarity among objects to properly distribute them into clusters, either focusing on their
sensitive and non-sensitive attributes (cf. Eqs. (3) and (4)). We define the following aggregated scores to
have an overall measure of goodness of the clusters:
(¬ ) =
(¬ ) =
1
1</p>
        <p>∑︁ ¬ (, ),
|Θ| ,∈Θ</p>
        <p>∑︁ ¬ (, ),
|Ω| ,∈Ω
( ) =
( ) =
1
1</p>
        <p>∑︁  (, ),
|Θ| ,∈Θ</p>
        <p>∑︁  (, ),
|Ω| ,∈Ω
(7)
(8)
where Ω = {,  ∈  | () = ()}, and Θ = {,  ∈  | () ̸= ()}. In particular, to obtain
fair clusters, we need to maximize (resp. minimize) the ( ), resp. ( ), scores, so that
objects having the same set of sensitive attributes will not be clustered together, rather they will be</p>
      </sec>
      <sec id="sec-5-6">
        <title>2https://archive.ics.uci.edu/ml/datasets/</title>
      </sec>
      <sec id="sec-5-7">
        <title>3https://www.kaggle.com/sakshigoyal7/credit-card-customers</title>
        <p>well-distributed across clusters. Conversely, we require to maximize, resp. minimize, the (¬ ),
resp. (¬ ), scores, to ensure that objects with the same set of non-sensitive attributes will be
clustered close with each other and not scattered across diferent clusters.</p>
      </sec>
      <sec id="sec-5-8">
        <title>Running times. We measure the running times of CCBounds and the competing methods while</title>
        <p>
          executing them on the Cresco6 cluster.4
6.4. Hyper-parameters and Configurations
Data sampling and attributes selection. To test the selected competing methods under diferent
conditions, and run even the most computationally expensive approaches, we adopt the sampling
strategy proposed in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Specifically, by sampling (without replacement) we extracted 1k or 10k tuples
from the original full set of tuples, by preserving some desired ratio between the protected classes. The
details of the sampling strategy used in our experiments are reported in Table 2, where the selected fair
attributes and split ratio (i.e., the fraction of tuples pertaining to diferent sensitive attribute values) are,
whenever possible, the same as [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Also, both Fairlets and HST-FC require two integers  and  as
input, whose ratio / corresponds to the minimum balance required by each clusters, yielded by these
algorithms. The configuration of these parameters, inspired by [
          <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
          ], is reported in Table 2.
        </p>
      </sec>
      <sec id="sec-5-9">
        <title>We highlight that, as described so far, we focus on a single and binary sensitive attribute to match the</title>
        <p>minimum requirements that embrace all competing methods. Nonetheless, some approaches (including
our CCBounds) can deal with multiple values assigned to a single sensitive attribute.</p>
      </sec>
      <sec id="sec-5-10">
        <title>Number of clusters. While Fairlets and HST-FC require a hyper-parameter  in input, denoting</title>
        <p>the desired number of output clusters, the same does not apply with the correlation clustering-based
approaches. Thus, to create a reasonable comparative environment, we use the (rounded) average
number of clusters returned by CCBounds in ten iterations as the  parameter for Fairlets and
HST</p>
      </sec>
      <sec id="sec-5-11">
        <title>FC. Moreover, we inherit the value  from the nearest subset when the correlation clustering-based</title>
        <p>approaches run out of memory.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>7. Results</title>
      <p>Table 3 summarizes the results achieved by CCBounds and the competing methods. With the exception
of very high running times and out of memory errors (indicated with NA and OOM, respectively), all
reported measurements correspond to averages over 10 runs of the tested algorithms. The similarity
values (Eqs. (7)–(8)) were obtained by using Euclidean and Jaccard similarities for numerical and</p>
      <p>Adult-1k
Adult-10k
Adult-Full
Bank-1k
Bank-10k
Bank-Full
CreditCard-1k
CreditCard-10k
Diabetes-1k
Diabetes-10k
Diabetes-Full
Student-1k</p>
      <p>CCBounds
Fairlets
HST-FC
Signed
CCBounds
Fairlets
HST-FC
Signed
CCBounds
Fairlets
HST-FC
Signed
CCBounds
Fairlets
HST-FC
Signed
CCBounds
Fairlets
HST-FC
Signed
CCBounds
Fairlets
HST-FC
Signed
CCBounds
Fairlets
HST-FC
Signed
CCBounds
Fairlets
HST-FC
Signed
CCBounds
Fairlets
HST-FC
Signed
CCBounds
Fairlets
HST-FC
Signed
CCBounds
Fairlets
HST-FC
Signed
CCBounds
Fairlets
HST-FC
Signed
of selected sensitive attributes or the set of non-sensitive attributes (cf. Table 1), and running time. For each
criterion, bold values correspond to the best-performing methods (possibly up to the second decimal point).</p>
      <p>(¬ ) ↑ ( ) ↓ (¬ ) ↓ ( ) ↑ time (s) ↓</p>
      <sec id="sec-6-1">
        <title>Section 6.1, we report results only for the vanilla fairlets decomposition, since the min-cost-flow (MCF)</title>
        <p>counterpart has very high running times (more than 7 minutes on the smallest dataset, i.e., Student-1k)
and produces solutions that are very similar to the vanilla one (results not shown for the sake of brevity).</p>
      </sec>
      <sec id="sec-6-2">
        <title>As for the balance, we notice that, although CCBounds does not match the high scores obtained</title>
        <p>by “fairness-native” methods (i.e., Fairlets and HST-FC), it is still able to score comparably with its
direct competing method, i.e., Signed. Exceptions arise in the case of Student-1k and Diabetes-1k, where</p>
      </sec>
      <sec id="sec-6-3">
        <title>CCBounds sets up to lower scores, and for some large datasets, where Signed does not terminate in</title>
        <p>reasonable time, while our CCBounds still obtains good results in reasonable time. The paradigm shifts
when we consider small yet heavily unbalanced datasets (i.e., CreditCard-1k, with an 80:20 ratio); here,
although several competing methods struggle to obtain high scores, CCBounds achieves the second-best
balance score. Overall, as the balance obtained by CCBounds in all evaluation scenarios ranges from
0.45 to 0.613, we can conclude that it is able of guaranteeing satisfactory balance scores.</p>
      </sec>
      <sec id="sec-6-4">
        <title>In the case of avg. Euclidean fairness, CCBounds obtains very good scores under diferent scenarios:</title>
        <p>it is among the best-performer approaches for the Adult-1k, Adult-Full and Bank-1k datasets, and
outperforms all the other methods by an order of magnitude on Bank-10k and Bank-Full. Conversely,
on the remaining datasets, CCBounds does not reach the top scores of some competitors.</p>
        <p>Considering the similarity computed on the sensitive attributes, CCBounds does not achieve the
best intra-cluster similarity, meaning that it tends to group a few more objects with the same sensitive
attribute value than the other methods. Nevertheless, the inter-cluster similarities are comparable
with the other methods, thus indicating that CCBounds is still able to properly separate the objects
into clusters, when accounting for the sensitive attribute. Instead, when we focus on the similarity
computed on the non-sensitive attributes, CCBounds achieves the best performance in all the considered
evaluation scenarios, yielding very high-quality clusters.</p>
      </sec>
      <sec id="sec-6-5">
        <title>Finally, we also investigated on running times, spotting Fairlets as the best performer, followed by</title>
        <p>HST-FC and CCBounds, which both guarantee reasonable running times. Although CCBounds has
quadratic time complexity due to pairwise similarity calculations (cf. Section 4), we managed to perform
in parallel such time-consuming steps. On the contrary, Signed requires excessively long execution
times, often resulting infeasible in practice, along with an abnormal number of clusters produced,
which is particularly large even when considering the smallest 1k datasets. Overall, it should be noted
that, albeit the observed running times should be taken with grain of salt due to the (lack of) code
optimizations, major remarks are consistent with the time complexities of the corresponding methods.</p>
      </sec>
      <sec id="sec-6-6">
        <title>Discussion. A number of remarks arise from our experimental evaluation. First, although native</title>
        <p>fairness-aware approaches are able to produce clustering solutions that optimize fairness notions, we
found out that such a capability comes with a cost, as the produced clusters are often far from being
qualitatively good. On the other hand, CCBounds demonstrated itself to be efective and versatile: it
was recognized as the best-in-case approach among the tested ones when it comes to find good-quality
clusters, while also being able not to excessively penalize aspects related to fairness.</p>
        <p>Second, although we unveiled the weakness in quality shown by the native fair-clustering approaches,
we nonetheless shed light on how the approaches based on correlation clustering might sufer from
computational issues, by being slower than the other methods, and requiring more memory. This is
particularly evident with Signed, as it is unable to terminate in all datasets having more than 10k
tuples, while it is kept under control in CCBounds, which goes down only in the case of Diabetes-Full
(containing more than 100k tuples, cf. Section 6.2), thanks to the numerous optimization adopted under
the hood. However, such a dataset makes it dificult to calculate similarities even for traditional and
more eficient approaches, despite the computing capabilities at our disposal.</p>
      </sec>
      <sec id="sec-6-7">
        <title>Finally, our approach achieves fairness-aware performance comparable to its direct competitor (Signed), while outperforming all considered methods in producing high-quality clusters, without compromising fairness.</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>8. Conclusions</title>
      <sec id="sec-7-1">
        <title>We showed how the correlation clustering method [4], CCBounds, efectively addresses fair clustering.</title>
      </sec>
      <sec id="sec-7-2">
        <title>Experiments on real data confirm the quality of its solutions, which outperform competing methods in standard clustering metrics while preserving fairness.</title>
      </sec>
      <sec id="sec-7-3">
        <title>Future work includes evaluating CCBounds with multiple protected values, exploring alternative</title>
        <p>
          similarity definitions, and extending its applicability to more complex scenarios with multiple sensitive
attributes and/or uncertainty [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], as well as to polarization detection [21], enhancing its practicality
and versatility.
        </p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <sec id="sec-8-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
      <sec id="sec-8-2">
        <title>Acknowledgements: D. Mandaglio and A. Tagarelli are partly supported by the PNRR Future AI</title>
      </sec>
      <sec id="sec-8-3">
        <title>Research (FAIR) project (H23C22000860006, M4C21.3 spoke 9). L. La Cava is supported by project</title>
      </sec>
      <sec id="sec-8-4">
        <title>SERICS (PE00000014), under the MUR National Recovery and Resilience Plan funded by the EU</title>
      </sec>
      <sec id="sec-8-5">
        <title>NextGenerationEU.</title>
        <p>[21] L. La Cava, D. Mandaglio, A. Tagarelli, Polarization in decentralized online social networks, in:</p>
      </sec>
      <sec id="sec-8-6">
        <title>Proceedings of the 16th ACM Web Science Conference, 2024, pp. 48–52. doi:10.1145/3614419.</title>
        <p>3644013.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Friedler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Moeller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Scheidegger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Venkatasubramanian</surname>
          </string-name>
          ,
          <article-title>Certifying and removing disparate impact</article-title>
          ,
          <source>in: Proc. ACM KDD Conf.</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>259</fpage>
          -
          <lpage>268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Blum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chawla</surname>
          </string-name>
          , Correlation clustering,
          <source>Mach. Learn</source>
          .
          <volume>56</volume>
          (
          <year>2004</year>
          )
          <fpage>89</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gullo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mandaglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tagarelli</surname>
          </string-name>
          ,
          <article-title>A combinatorial multi-armed bandit approach to correlation clustering</article-title>
          ,
          <source>DAMI</source>
          <volume>37</volume>
          (
          <year>2023</year>
          )
          <fpage>1630</fpage>
          -
          <lpage>1691</lpage>
          . doi:
          <volume>10</volume>
          .1007/S10618-023-00937-5.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gullo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. La</given-names>
            <surname>Cava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mandaglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tagarelli</surname>
          </string-name>
          ,
          <article-title>When correlation clustering meets fairness constraints</article-title>
          ,
          <source>in: Proceedings of International Conference on Discovery Science</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>302</fpage>
          -
          <lpage>317</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -18840-4\_
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chhabra</surname>
          </string-name>
          , K. Masalkovaitė,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mohapatra</surname>
          </string-name>
          ,
          <article-title>An overview of fairness in clustering</article-title>
          ,
          <source>IEEE Access 9</source>
          (
          <year>2021</year>
          )
          <fpage>130698</fpage>
          -
          <lpage>130720</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chierichetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lattanzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vassilvitskii</surname>
          </string-name>
          ,
          <article-title>Fair clustering through fairlets</article-title>
          ,
          <source>in: Proc. NIPS Conf</source>
          .,
          <year>2017</year>
          , pp.
          <fpage>5029</fpage>
          -
          <lpage>5037</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Bera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chakrabarty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Flores</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Negahbani</surname>
          </string-name>
          ,
          <article-title>Fair algorithms for clustering</article-title>
          ,
          <source>in: Proc. NIPS Conf</source>
          .,
          <year>2019</year>
          , pp.
          <fpage>4955</fpage>
          -
          <lpage>4966</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I. O.</given-names>
            <surname>Bercea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Groß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khuller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rösner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <article-title>On the cost of essentially fair clusterings</article-title>
          ,
          <source>in: Proc. APPROX/RANDOM Conf.</source>
          ,
          <year>2019</year>
          , pp.
          <volume>18</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          :
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kleindessner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Awasthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Morgenstern</surname>
          </string-name>
          ,
          <article-title>Fair k-center clustering for data summarization</article-title>
          ,
          <source>in: Proc. ICML Conf</source>
          .,
          <year>2019</year>
          , pp.
          <fpage>3448</fpage>
          -
          <lpage>3457</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Abraham</surname>
          </string-name>
          , D. P,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Sundaram</surname>
          </string-name>
          ,
          <article-title>Fairness in clustering with multiple sensitive attributes</article-title>
          ,
          <source>in: Proc. EDBT Conf</source>
          .,
          <year>2020</year>
          , pp.
          <fpage>287</fpage>
          -
          <lpage>298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Backurs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Indyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Onak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schieber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vakilian</surname>
          </string-name>
          , T. Wagner,
          <article-title>Scalable fair clustering</article-title>
          ,
          <source>in: Proc. ICML Conf</source>
          .,
          <year>2019</year>
          , pp.
          <fpage>405</fpage>
          -
          <lpage>413</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kleindessner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Samadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Awasthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Morgenstern</surname>
          </string-name>
          ,
          <article-title>Guarantees for spectral clustering with fairness constraints</article-title>
          ,
          <source>in: Proc. ICML Conf</source>
          .,
          <year>2019</year>
          , pp.
          <fpage>3458</fpage>
          -
          <lpage>3467</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmadian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Epasto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Knittel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mahdian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Moseley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vassilvitskii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Fair hierarchical clustering</article-title>
          ,
          <source>in: Proc. NIPS Conf</source>
          .,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmadian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Epasto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mahdian</surname>
          </string-name>
          ,
          <article-title>Fair correlation clustering</article-title>
          ,
          <source>in: Proc. AISTATS Conf</source>
          .,
          <year>2020</year>
          , pp.
          <fpage>4195</fpage>
          -
          <lpage>4205</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mandaglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tagarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gullo</surname>
          </string-name>
          ,
          <article-title>Correlation clustering with global weight bounds</article-title>
          ,
          <source>in: Proc. ECML-PKDD Conf.</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>499</fpage>
          -
          <lpage>515</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -86520-7\_
          <fpage>31</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ailon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Charikar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Newman</surname>
          </string-name>
          ,
          <article-title>Aggregating inconsistent information: Ranking and clustering</article-title>
          ,
          <source>JACM</source>
          <volume>55</volume>
          (
          <year>2008</year>
          )
          <volume>23</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          :
          <fpage>27</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Charikar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Guruswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wirth</surname>
          </string-name>
          ,
          <article-title>Clustering with qualitative information</article-title>
          ,
          <source>JCSS</source>
          <volume>71</volume>
          (
          <year>2005</year>
          )
          <fpage>360</fpage>
          -
          <lpage>383</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Demaine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Emanuel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fiat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Immorlica</surname>
          </string-name>
          ,
          <article-title>Correlation clustering in general weighted graphs</article-title>
          ,
          <source>TCS</source>
          <volume>361</volume>
          (
          <year>2006</year>
          )
          <fpage>172</fpage>
          -
          <lpage>187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ailon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Charikar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Newman</surname>
          </string-name>
          ,
          <article-title>Aggregating inconsistent information: ranking and clustering</article-title>
          ,
          <source>in: Proc. ACM STOC Symp</source>
          .,
          <year>2005</year>
          , pp.
          <fpage>684</fpage>
          -
          <lpage>693</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mandaglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tagarelli</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Gullo, In and out: Optimizing overall interaction in probabilistic graphs under clustering constraints</article-title>
          ,
          <source>in: Proc. ACM KDD Conf.</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1371</fpage>
          -
          <lpage>1381</lpage>
          . doi:
          <volume>10</volume>
          . 1145/3394486.3403190.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>