1. Introduction

An Associative Approach to Fair Co-clustering

(Discussion Paper)

Federico Peiretti

Ruggero G. Pensa

0 0 University of Turin , Italy

2026

Co-clustering is a powerful data mining tool that extracts summary information from a data matrix, by simultaneously computing row and column clusters that provide a compact representation of the data. However, if the matrix contains data about individuals, the co-clustering results may be influenced by the societal biases that are reproduced in the data. Despite the extensive research on fairness considerations in clustering, this issue has not been addressed in the context of co-clustering algorithms. This paper proposes a novel fair co-clustering algorithm based on an associative measure derived from the Goodman-Kruskal's , which has demonstrated good convergence properties. This ensures optimal clustering and fairness performance by implementing an in-process rebalancing mechanism inspired by the fair assignment problem. An extensive experimental validation is provided to demonstrate the eficacy of our approach.

eol>Clustering Fairness High-dimensional data

1. Introduction

Clustering results, as well as those of any other machine learning tasks, can be afected by the presence of any sort of bias in the data. When the data are related to human beings, and clustering is used to drive some critical decision process, such bias could lead to unfair or discriminatory outcomes towards minority groups or protected categories, a situation known as disparate impact. To address this issue, fair clustering has recently emerged as a solution aimed at mitigating the efects of existing biases in the data [1]. Some examples of fair methods for clustering include the balanced representation [2, 3, 4], the proportionally fair clustering [5] and the equitable distance fairness [6]. However, when dealing with high-dimensional data, most distance-based clustering techniques struggle to identify actual patterns in the data, due to the efects of the well-known phenomenon of the curse of dimensionality. To cope with this issue, co-clustering (the simultaneous partitioning of rows and columns of a data matrix) has shown its efectiveness in many challenging scenarios, with diferent forms of data distributions and matrix sparsity [7]. Co-clustering has another advantage: the partition on columns provides explanatory patterns for the row clustering, and vice-versa, thus making co-clustering an intrinsic interpretable unsupervised task. Unfortunately, co-clustering is even more seriously concerned by fairness issues than clustering. In fact, biases could afect either the row or the column partitioning, or even both. Consider, for instance, a user × movie matrix recording the ratings given by each user to some movie. Co-clustering can be used to group together similar users (exhibiting similar preferences) and similar movies (liked by similar users). If the outcome of the co-clustering is used to perform movie recommendation to users, suggestions might reflect societal biases present in the data and, consequently, be deeply unfair. Worse than that, such suggestions may contribute to the reinforcement of prejudices on demographic categories of people, thus making data even more biased. Although fair recommendation has been extensively addressed [8], it is worth noting that co-clustering is a more general technique that can be used in diferent data analysis pipelines or knowledge discovery processes, such as text mining [9], image segmentation [10], transfer learning [11], object detection and scene categorization [10]. Despite its wide employment, to our knowledge, the problem of bias mitigation in co-clustering has never been studied as such. The only most similar approach uses co-clustering within a fair recommendation framework [12]. However, while the whole process ensures unbiased recommendations, the preliminary co-clustering process is not entirely fair.

To fill this gap in the fair clustering literature, we propose a fair co-clustering algorithm based on an associative measure known as the de-normalized Goodman-Kruskal’s , that has good convergence properties and does not require the final number of co-clusters to be defined a priori experimentally that our approach is efective in identifying fair co-clusters that mitigate the disparate impact and, at the same time, still preserve a good quality. Additionally, we compare our algorithm with a competitor that performs latent block model for fair recommendation and uses a fairer optimization that could be used, in theory, to obtain unbiased co-clusters. However, we show that this is not suficient 1 . We show to pursue our goal, thus making our approach the first truly fair co-clustering method.

2. Background and motivation

This section delves into fundamental concepts related to fairness and co-clustering, essential for understanding the functionality of our proposed fair co-clustering algorithm.

2.1. Fair clustering

Fair clustering is a rapidly evolving field within algorithmic fairness in unsupervised learning, aiming to prevent clustering algorithms from favoring specific demographics. A prominent fairness notion in clustering is balance, initially introduced by Chierichetti et al. for two protected groups (e.g., Male and Female) [2]. Bera et al. generalize the balance to accommodate multiple protected groups by ensuring that the ratio of points from each group in every cluster matches the overall dataset ratio [3]. They define balance as follows: Definition 1 (Balance). The balance of a clustering is defined as: () =

min ∈,∈ min ︂(

( ) , ( ) )︂ (1) ratio of the group ∈ in cluster , i.e., = ||/|| and ( ) = |( )|/|( )|. where is the set of protected groups, is the ratio of the group ∈ in the dataset , ( ) is the

In this paper, we use the definition given by Gupta et al. [ 13]. They introduce the notion of -ratio fairness, which ensures that each cluster contains a predefined fraction of points for each protected attribute value.

Definition 2 ( -ratio fairness). Let = ( )|=|1 be a vector, where ∈ [0, 1 ] for all protected groups ∈ . A clustering solution satisfies -ratio fairness if, for each cluster and each protected group , the number of points belonging to the group in is at least , where denotes the total number of points belonging to group , i.e., |( )| ≥

with ∈ [0, 1/].

We denote the number of clusters with . Specifically, when is set to 1/, the definition is equivalent to Definition 1. 1The majority of co-clustering algorithms require the final number of row and column clusters to be found as an input. In contrast, our algorithm does not require such prior knowledge. Instead, it relies on the initial number of row and column clusters during execution, where possible, adapting to the data and determining the final number autonomously.

2.2. Fast Co-clustering

Fast- CC [14] is a recent co-clustering algorithm that has good convergence properties and is also able to identify a congruent number of clusters on rows and columns, starting from an initial overestimation. Given a data matrix A = ( ) ∈ R+× , a co-clustering of A is a pair (ℛ, ), where ℛ is a partition of the rows and a partition of the columns of the matrix. The objective function of Fast- CC is derived from the Goodman and Kruskal’s [15], and can be defined as follows: 3. Fair Co-Clustering ^| (ℛ, ) = ∑|ℛ︁| ∑|︁| 2 =1 =1 · · − =1 ∑|ℛ︁| 2· 2 where T = () is the contingency table associated to the co-clustering (ℛ, ), where ℛ = (ℛ1, . . . , ℛ ) and = (1, . . . , ), i.e. = ∑︀∈ℛ ∑︀∈ , for = 1, . . . , and = 1, . . . , . Following this notation, · = ∑︀=1 , · = ∑︀=1 and = ∑︀=1 ∑︀=1 . Analogously, the association of the column clustering to the row clustering ℛ can be evaluated through the function ^|(ℛ, ). Since ^ is not symmetric, the best co-clustering solutions are those that simultaneously maximize ^| and ^|. In [14] an iterative optimization strategy is introduced. It alternates the computation of ^| by fixing the column partition and the computation of ^| by keeping the row partition fixed. (2) (3)

3.1. Fair- CC algorithm

We now introduce Fair- CC, the fair adaptation of the current state-of-the-art co-clustering method proposed by Battaglia et al. [14]. The primary objective of this algorithm is to ensure balanced representation of each protected group in every row cluster. Specifically, it guarantees a minimum fraction of points from each protected group in every cluster, adhering to the concept of -ratio fairness (refer to Eq.2), hereinafter referred as to avoid any ambiguity. The pseudocode for this algorithm is detailed in Algorithm 1, while the procedure for updating the row clustering is illustrated in Algorithm 2. In this section, we present Fair- CC, a fair co-clustering method based on the de-normalized GoodmanKruskal’s (see Eq. 2). We first define the problem of fairness in co-clustering, then describe the algorithm for computing the co-clustering results in a fair manner.

Definition 3 (Fair Co-clustering). Given a data matrix A and protected groups = {0, . . . , }, = {0, . . . , } referring to the row and column objects respectively, a co-clustering (ℛ, ) is fair if both row and column clustering ℛ, are fair.

Drawing inspiration from the definition of balance for clustering [ 2, 3], we define it for co-clustering tasks. Ideally, a co-clustering is balanced if, for each protected group associated with the row (column) objects, the ratio of its points in every row (column) clusters is the same as the ratio of its points over the whole dataset.

Definition 4 (Co-clustering Balance). Let and be sensitive features associated with the row and column items, such that ∈ and ∈ , where and are the protected groups the -th row and -th column items belong to, respectively. The balance of a co-clustering (ℛ, ) is defined as:

({ℛ, }) = ((ℛ), ())

The protected groups for both row and column objects are not always known. Therefore, if only () is known, co-clustering is considered fair if the row (column) clustering is fair (i.e., (ℛ) ≈ 1 ). For simplicity, in this work we ensure the fairness for only the protected groups of row objects.

Algorithm 1 Fair CC(A, s, , , , , ) initial number of row and column clusters and , max number of iterations , a vector data matrix A, a sensitive feature s = [0, . . . , ], protected groups = {0, . . . , }, = [

Result: R, C row and column clustering such that R satifies -ratio fairness (Eq.2). Initialize R(0) and C(0); while ℎ and < do end end ← + 1;

ℎ ← False; C() R() ← FairUpdateRowClusters(P, C

← UpdateColumnClusters(P, R ∑︀∈ℛ ∑︀

∈ First, we must introduce two matrices P = ( ) and Q = (), with = and = = , where denotes the sum of all the entries of A (hence, = ). We also introduce ( = 0 otherwise) and = 1 if column is in column cluster . According to this notation, the row cluster incidence matrix R = and C = , with = 1 if row is in row cluster ℛ

Q = R⊤PC Equation 2 can be then rewritten as: ^| (ℛ, ) = ∑︁ ∑︁ ⎛

⎝ =1 =1 ∈ℛ · ∑︁ ⎞ ⎠ − ⎛ ∑︁ ⎝

⎞ ∑︁ · ⎠ · =1 ∈ℛ ∑︀=1

= ∑︀=1 · = ∑︀

∈ · = · . where · = ∑︀∈ℛ · = ∑︀

= ∑︀ ∑︀∈ℛ

=1 ∑︀∈ , ·

=1 = ∑︀ ∑︀∈ℛ =1 , · ∑︀∈ , and · = · = ∑︀∈ · = = ∑︀=1 , = ∑︀∈ , and

Let R() be the row cluster incidence matrix at iteration , and Q() = R()⊤PC its associated distribution. The objective function ^| (ℛ(), ) is ^| (ℛ (), ) = ∑︁ (︃ ∑︁ =1 =1 · () − · · () )︃ similarity function between any row p of P and q() is defined: Each row q() of Q() can be interpreted as a prototype of the -th cluster of ℛ(), and the following ︁( = ∑︁

() − · (·) ^| (ℛ (), ) = ∑︁ ︁( p, q⋆

())︁ =1 It measures the similarity between a “point” and a cluster prototype (). The objective function becomes where ⋆ = arg max ︁( ︁( p, q

())︁ is the cluster assignment maximizing function . (4) (5) (6) (7) ∈ [0, 1].

matrix P, column clustering C, initial row clustering R(0), sensitive feature s = [0, . . . , ], protected groups = {0, . . . , }, fairness parameters = [ 0, . . . , ] with Q(−1)

= R(−1)⊤ PC; compute U(−1) and V(−1) as in Eq. 9; Σ = PC(Q(−1) for = 1, . . . , do

⊙ U ⋆() ← arg max (−1) ( ); − V

(−1) )⊤; end end compute R() using ⋆; remove empty clusters and update R(); if R() violates -ratio fairness then

R() = FairRowAssignments(R(), Σ, s, , ); ← + 1; (8) (9) (10) where = ︁( Algorithm 2 uses two × matrices U and V to compute all values in a × matrix Σ = ( ), Σ = PC(Q(−1) ⊙ U (−1) − V (−1) )⊤ where ⊙ indicates the Hadamard matrix product, and

U() = ⎢⎢ ⎡ ⎣ 1 . . .

1 ∑︀ 1

() · · · ∑︀ 1 () · · · . . .

1 . . .

1 ∑︀ () ⎤ () ∑︀ ⎥⎥ , ⎦

V() = ⎢⎢ ⎡ ⎣ 1 . . .

1 ∑︀ 1

() · · · ∑︀ () · · · . . .

∑︀ 1 1 . . . 1 () ⎤ ⎥ ⎥ ⎦ () ∑︀ Then, the algorithm also removes all empty clusters. Hence, from one iteration to another, the number of clusters may decrease and R(0) and C(0) can be initialized with random partitions using safely high values of and . derive a × matrix D = ( ), defined as follows:

Given this initial assignment, we evaluate whether the optimal solution R* satisfies the -ratio fairness property. If it does not, a fair assignment R is determined (see Algorithm 3). The trade-of between fairness and clustering quality is managed through the utilization of the similarity matrix Σ. Let s = [0, . . . , ] denote the sensitive feature associated with the rows of the data matrix, where ∈ and = {0, . . . , } represents the set of protected groups. From the similarity matrix Σ, we = (p , q*) − (p , q) ∀ = 1, . . . ,

Here, (p , q*) indicates the similarity value between point and its optimal cluster prototype *, while (p , q) represents the similarity value between point and an alternative cluster prototype . Consequently, quantifies the loss in clustering quality when point is allocated to cluster instead of its optimal cluster * . To ensure optimal preservation of quality, it is important to determine the sequence in which cluster prototypes for each point should be evaluated and the sequence in which the points from the same protected group should be chosen. To do this, we sort the indices of the row vector d, corresponding to the cluster prototypes of the point , by value in ascending order. Then, for each protected group, we sort points by d in ascending order.

Algorithm 3 FairRowAssignments(R* , Σ, s, , ) Input: The optimal row clustering R* , × similarity matrix Σ, sensitive feature s = [0, . . . , ], protected groups = {0, . . . , }, fairness parameters = ( 0, . . . , ) with ∈ [0, 1], ∀ ∈ .

Result: row clustering R that satisfies -ratio fairness Initialize R = 0(×) ; Compute D as in Eq.10; Sort cluster prototypes by values in ascending order, ∀ = 1, . . . , ; Sort row objects by protected group and then by value in ascending order; for in do

A = {p ∈ A s.t. s = }; = |A|; = 1 ; for = 1 . . . ⌊ ⌋ do for = 1 . . . do p = arg minp∈A:∑︀ =0((p , q* ) − (p

=1 , , = 1; , q )); end end end

∀p ∈ A : ∑︀=1 , = 0, ,* = *,* ;

For each protected group , a fraction of unassigned row items equivalent to is chosen for allocation to a non-optimal cluster with the aim of minimizing loss value and ensuring fairness. The parameter denotes the number of points belonging to the protected group . The fairness parameter ∈ ︀[ 0, 1 ]︀ is the fraction of points to be allocated in each cluster. Specifically, it is defined as = 1 where represents the number of row clusters identified by the vanilla approach and ∈ [0, 1] is a user-defined parameter that quantifies the desired level of fairness. If = 1.0 for a group , then the points will be equally distributed across clusters ( points in each cluster) and the group’s ratio in each cluster matches its ratio in the overall dataset. Conversely, if = 0.0 for a group , fairness violation is permitted for that group. If all groups have their parameters set to zero ( = 0.0, ∀ ∈ ), any solution is acceptable, allowing for selection of the optimal row clustering. Notably, if = 1.0 for all groups, row clustering achieves perfect balance ( ≈ 1.0 ), otherwise with = 0.8 the 80% rule of disparate impact doctrine is guaranteed. Finally, any points that remain unallocated at the end of this procedure are assigned to their optimal cluster.

4. Experiments

In this section, we present the findings from our experiments conducted on four real-world datasets to evaluate the efectiveness of Fair- CC. In our experiments, we use four high-dimensional datasets: two ratings dataset (MovieLens-1M [16] and Yelp [17, 18]), a product reviews dataset (Amazon reviews [17, 18]), and an image collection for facial recognition (Labeled Faces in the Wild [19]). Table 1 summarizes the characteristics of the data matrix for each dataset utilized in our experiments. We compared our algorithm against the standard version of Fast- CC, which does not incorporate fairness constraints and the only closely related competitor, Parity LBM [12], a latent block model designed for fair recommendations independent of protected attributes. To assess the performance of each algorithm regarding co-clustering quality and fairness, we employed several evaluation metrics: • | and |: the Goodman-Kruskal’s s measuring the quality of row and column clustering predicted by both versions of CC algorithm. • ARI: the Adjusted Rand Index. It is used to compare the agreement between row and column assignments predicted by the fair algorithms with those from the corresponding vanilla approach (ARIrows and ARIcols in Table 2). Additionally, it is used to compute the aggrement between the clustering and the given ground-truth labels detailed in Table 1 (ARI in Table 2). • Balance: This metric quantifies the balanced representation of protected groups within each cluster according to Definition 1. • Kullback-Leibler fairness error: Based on Kullback-Leibler divergence as proposed in [20], it quantifies the fairness error in clustering. Lower KL error values indicate better adherence to fairness constraints.

To evaluate the efectiveness of our algorithm, we set the number of initial clusters for both rows and columns to = 10 and = 10, respectively. Furthermore, for adjusting the trade-of between the level of fairness and co-clustering eficiency, we launched the experiments varying all values within the range [0, 1] for all protected groups. Conversely, Parity LBM was executed with hyperparameters configured as follows: 25 row and column clusters to be found for the MovieLens dataset and 10 for all others; a maximum number of 300 epochs for the training, and a learning rate of 2e-2.

4.1. Results

In Table 2, we report the performance of Fair- CC in comparison with its vanilla version (Fast- CC), the direct competitor (Parity LBM) and its non-fair counterpart (LBM). We present two versions of our algorithm: the first with a maximum fairness constraint (Fair- CC), and the second with a more relaxed fairness constraint allowing a small violation for only one protected group (Fair- CCweak). For MovieLens (ML) dataset with age as sensitive attribute, having three protected groups, the identification of a fair row clustering is more challenging, as it engenders greater problem complexity and necessitates the allocation of additional computational resources for its resolution. Consequently, in such instance, we allow a minor infringement on the constraint for two protected groups. The values of the relaxed version are selected from two values, 0.9 and 1.0, by maximizing the row clustering quality | .

Overall, the results demonstrate that Fair- CC consistently delivers superior fairness performance across all datasets while maintaining reasonable clustering quality relative to its vanilla counterpart, Fast- CC, and the other competitors (Parity LBM and standard LBM). Allowing slight violations of fairness constraints, even for a single protected group, can lead to and improvement in terms of clustering quality, while achieving substantial gains in fairness compared to non-fair method. This trade-of makes Fair- CCweak particularly suitable for applications where both fairness and clustering efectiveness are critical considerations.

5. Conclusion

We have introduced an algorithm that computes co-clustering with fairness constraints. It seeks a tradeof between cluster quality and balance by adopting an optimization strategy accounting for the protected groups data instances belong to, by exploiting the properties of a co-clustering approach based on an associative statistical measure that has some desirable properties: it leads to fast convergence and to the identification of a congruent number of clusters on both rows and columns starting from an initial overestimation. The experiments have shown that our algorithm is efective also when compared with the only existing competitor, a fair recommendation approach based on co-clustering. As future work, we will study the co-clustering problem under the individual fairness setting and we will consider multiobjective optimization as a way to automatically select optimal quality-fairness tradeofs.

Declaration on Generative AI

The authors have not employed any Generative AI tools. [1] A. Chhabra, K. Masalkovaite, P. Mohapatra, An overview of fairness in clustering, IEEE Access 9 (2021) 130698–130720. [2] F. Chierichetti, R. Kumar, S. Lattanzi, S. Vassilvitskii, Fair clustering through fairlets, in: Proc.

NIPS 2017, 2017, pp. 5029–5037. [3] S. K. Bera, D. Chakrabarty, N. Flores, M. Negahbani, Fair algorithms for clustering, in: Proc.

NeurIPS 2019, 2019, pp. 4955–4966. [4] I. O. Bercea, M. Groß, S. Khuller, A. Kumar, C. Rösner, D. R. Schmidt, M. Schmidt, On the cost of essentially fair clusterings, in: Proc. APPROX/RANDOM 2019, volume 145, 2019, pp. 18:1–18:22. [5] X. Chen, B. Fain, L. Lyu, K. Munagala, Proportionally fair clustering, in: Proc. ICML 2019, volume 97, 2019, pp. 1032–1041. [6] D. Chakrabarti, J. P. Dickerson, S. A. Esmaeili, A. Srinivasan, L. Tsepenekas, A new notion of individually fair clustering: -equitable k-center, in: Proc. AISTATS 2022, volume 151, 2022, pp. 6387–6408. [7] E. Battaglia, F. Peiretti, R. G. Pensa, Co-clustering: A survey of the main methods, recent trends, and open problems, ACM Comput. Surv. 57 (2025) 48:1–48:33. [8] Y. Zhao, Y. Wang, Y. Liu, X. Cheng, C. C. Aggarwal, T. Derr, Fairness and diversity in recommender systems: A survey, ACM Trans. Intell. Syst. Technol. 16 (2025) 2:1–2:28. [9] Y. Chen, Z. Lei, Y. Rao, H. Xie, F. L. Wang, J. Yin, Q. Li, Parallel non-negative matrix tri-factorization for text data co-clustering, IEEE Trans. Knowl. Data Eng. 35 (2023) 5132–5146. [10] M. Keuper, S. Tang, B. Andres, T. Brox, B. Schiele, Motion segmentation & multiple object tracking by correlation co-clustering, IEEE Trans. Pattern Anal. Mach. Intell. 42 (2020) 140–153. [11] P. Zeng, Z. Lin, couple coc+: An information-theoretic co-clustering-based transfer learning framework for the integrative analysis of single-cell genomic data, PLoS Computational Biology 17 (2021) e1009064. [12] G. Frisch, J. Léger, Y. Grandvalet, Co-clustering for fair recommendation, in: Proc. of BIAS 2021, co-located with ECML PKDD 2021, volume 1524 of CCIS, Springer, 2021, pp. 607–630. [13] S. Gupta, G. Ghalme, N. C. Krishnan, S. Jain, Eficient algorithms for fair clustering with a new notion of fairness, Data Min. Knowl. Discov. 37 (2023) 1959–1997. [14] E. Battaglia, F. Peiretti, R. G. Pensa, Fast parameterless prototype-based co-clustering, Mach.

Learn. 113 (2024) 2153–2181. [15] L. A. Goodman, W. H. Kruskal, Measures of association for cross classification, Journal of the

American Statistical Association 49 (1954) 732–764. [16] F. M. Harper, J. A. Konstan, The movielens datasets: History and context, ACM Trans. Interact.

Intell. Syst. 5 (2016) 19:1–19:19. [17] O. Fan-Osuala, Gender-based diferences in online reviews: An empirical investigation, in:

H. Krcmar, J. Fedorowicz, W. F. Boh, J. M. Leimeister, S. Wattal (Eds.), Proc. ICIS 2019, 2019. [18] O. Fan-Osuala, Gender Bias In Online Reviews, 2020. doi:10.6084/m9.figshare.12834617.

v4. [19] G. B. Huang, M. Mattar, T. Berg, E. Learned-Miller, Labeled faces in the wild: A database forstudying face recognition in unconstrained environments, in: Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, 2008. [20] I. M. Ziko, J. Yuan, E. Granger, I. B. Ayed, Variational fair clustering, in: Proc. AAAI 2021, 2021, pp. 11202–11209.