=Paper=
{{Paper
|id=Vol-3284/7682
|storemode=property
|title=On the Parametrized Complexity of the s-Club Cluster Edge Deletion Problem
|pdfUrl=https://ceur-ws.org/Vol-3284/7682.pdf
|volume=Vol-3284
|authors=Fabrizio Montecchiani,Giacomo Ortali,Tommaso Piselli,Alessandra Tappini
|dblpUrl=https://dblp.org/rec/conf/ictcs/MontecchianiOPT22
}}
==On the Parametrized Complexity of the s-Club Cluster Edge Deletion Problem==
On the Parameterized Complexity of the π -Club Cluster Edge Deletion Problem (Short Paper) Fabrizio Montecchiani, Giacomo Ortali* , Tommaso Piselli and Alessandra Tappini UniversitΓ degli Studi di Perugia, Italy Abstract We study the parameterized complexity of the π -Club Cluster Edge Deletion problem: Given a graph πΊ and two integers π β₯ 2 and π β₯ 1, is it possible to remove at most π edges from πΊ such that each connected component of the resulting graph has diameter at most π ? This problem is known to be NP-hard already when π = 2. We prove that it admits a fixed-parameter tractable algorithm when parameterized by π and the treewidth of the input graph. Keywords π -Club Cluster Edge Deletion, Parameterized Complexity, π -Club, Treewidth 1. Introduction Graph clustering [1] is a classical task in data mining, with important applications in numerous fields including computational biology [2], image processing [3], and machine learning [4]. A prominent graph-theoretic formalization is Cluster Editing (also known as Correlation Clustering): Given a graph πΊ and an integer π as input, the goal is to find a sequence of π operations, each of which can be an edge insertion or removal, such that the resulting graph is a so-called cluster graph, i.e., each of its connected components is a clique. If we restrict the editing operations to be edge removals only, then the problem is known as Cluster Edge Deletion. Equivalently, we seek for a partition of the vertices of πΊ into cliques, such that the inter- cluster edges (whose end-vertices belong to different cliques) are at most π. Unfortunately, both Cluster Editing and Cluster Edge Deletion are well-known to be NP-complete [5, 6]. Indeed, their parameterized complexity with respect to the natural parameter π has been intensively investigated; in particular, both problems are in FPT [7, 8], but do not allow subexponential-time parameterized algorithms unless ETH fails [9, 6]. In many applications, modelling clusters with cliques might be a severe limitation, for instance, in presence of noise in the data collection process. Consequently, several notions of relaxed cliques have been introduced and investigated [10, 11]. We focus on the concept of π -club, in Proceedings of the 23rd Italian Conference on Theoretical Computer Science, Rome, Italy, September 7-9, 2022 * Corresponding author. " fabrizio.montecchiani@unipg.it (F. Montecchiani); giacomo.ortali@unipg.it (G. Ortali); tommaso.piselli@studenti.unipg.it (T. Piselli); alessandra.tappini@unipg.it (A. Tappini) 0000-0002-0543-8912 (F. Montecchiani); 0000-0002-4481-698X (G. Ortali); 0000-0001-9192-2067 (A. Tappini) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) which each pair of vertices is at distance at most π β₯ 2 in the cluster. (Note that a 1-club is in fact a clique.) We remark that defining clusters as π -clubs proved to be effective in several application scenarios such as social network analysis and bioinformatics [12, 13, 14, 15, 16]. The π -Club Cluster Edge Deletion problem can be stated analogously as Cluster Edge Deletion by replacing cliques with π -clubs (formal definitions are given later). Unfortunately, π -Club Cluster Edge Deletion is NP-complete already for π = 2 [17]. Also, 2-Club Cluster Edge Deletion belongs to FPT parameterized by π [18, 17], and it admits no subexponential-time parameterized algorithm in π [19]. More in general, for any π β₯ 2, π -Club Cluster Edge Deletion cannot be solved in time 2π(π) ππ(1) unless ETH fails [19]. Based on the above discussion, we know that it is unlikely that π -Club Cluster Edge Dele- tion lies is FPT when parameterized by π , whereas the complexity of the problem parameterized by π + π is open to the best of our knowledge. In this paper, we instead focus on those scenarios in which the solution size (measured by π) is large, and we still aim for tractable problems based on alternative parameterizations. In this respect, treewidth is a central parameter in the parameterized complexity analysis (see [20, 21]). We prove that π -Club Cluster Edge Deletion lies in FPT when parameterized by π + tw, where tw is an upper bound for the treewidth of the input graph. Theorem 1 Let πΊ be an π-vertex graph of treewidth at most tw. There is an algorithm that solves π(tw2 log π ) the π -Club Cluster Edge Deletion problem on πΊ in π(22 Β· π) time. From the technical viewpoint, the main crux of our approach lies in the definition of suf- ficiently small records that allow to keep track of the distances between pairs of vertices in a (partial) π -club. With such records at hand, we then apply a standard DP algorithm over a tree decomposition of the input graph, which still requires a nontrivial amount of technicalities in order to update the records. Our records have similarities, but also several key differences, with those used in a technique presented by Dondi and Lafond in [22, Thm. 14], which solves a related problem for the restricted case π = 2. For space constraints many technicalities are omitted and we only sketch the proof of Theorem 1. See [23] for the full version of the paper. Preliminaries and notation. For any π β Z+ , we use [π] as shorthand for the set {1, 2, . . . , π}. Let πΊ = (π, πΈ) be a graph. For any π β π , we denote by πΊ[π ] the subgraph of πΊ induced by the vertices of π . The neighborhood of a vertex π£ of πΊ is defined as ππΊ (π£) = {π’ : π’π£ β πΈ}. Given two vertices π’, π£ β π , the distance in πΊ between π’ and π£, denoted by ππΊ (π’, π£), is the number of edges in any shortest path between π’ and π£ in πΊ. The diameter of πΊ is the maximum distance in πΊ between any two of its vertices. An π -club of πΊ, with π β₯ 1, is a subset π β π such that the diameter βοΈπof πΊ[π ] is at most π . A partition of πΊ is a collection of subsets π = {πΆπ }πβ[π] such that: (a) π=1 πΆπ = π , and (b) πΆπ β© πΆπ = β for each π, π β [π] with π ΜΈ= π. We denote by πΈπ the set of all edges π’π£ of πΊ such that π’, π£ β πΆπ , for some π β [π]. We study the following problem. π -Club Cluster Edge Deletion Input: πΊ = (π, πΈ), π β₯ 1, π β₯ 2. Output: A partition π = {πΆπ }πβ[π] of πΊ such that πΆπ is an π -club for each π β [π], and |πΈ β πΈπ | β€ π. In what follows, for a graph πΊ = (π, πΈ), the pair (π³ , π ) denotes a nice tree-decomposition, such that π³ = {ππ }πβ[β] is a collection of subsets of vertices of π , called bags, and π is a tree whose nodes are in one-to-one correspondence with the elements of π³ . We point the reader to [24, 25] for the required background. 2. Sketch of the Proof of Theorem 1 The proof is based on a DP algorithm over a nice tree-decomposition. We first describe the records to be stored at each bag, and we then sketch the algorithm. Definition of the records. Let πΊ = (π, πΈ) be an π-vertex graph and let (π³ , π ) be a nice tree-decomposition of πΊ of width tw. For each π β [β], let ππ be the subtree of π rooted at the bag ππ β π³ and let πΊπ = (ππ , πΈπ ) be the subgraph of πΊ induced by the vertices that belong to at least one bag of ππ . A subset of vertices πΆ β ππ is a potential π -club, and we let ππΆ = πΆ β© ππ and int(πΆ) = πΆ β ππ . The first item of the record is a table that stores the pairwise distances of the vertices in ππΆ. Namely, let π·(ππΆ) be a table having one row and one column for each vertex in ππΆ, and such that: if π = π β§ β¨0, βͺ π·(ππΆ)[π, π] = ππΊ[πΆ] (π, π), if 1 β€ ππΊ[πΆ] (π, π) β€ π otherwise. βͺ β, β© The second item is a table that stores the distance between pairs of vertices such that one is in ππΆ and the other is in int(πΆ). Two vertices π’, π’β² in int(πΆ) are equivalent with respect to ππΆ, if for each vertex π β ππΆ, then either 1 β€ ππΊ[πΆ] (π’, π) = ππΊ[πΆ] (π’β² , π) β€ π , or ππΊ[πΆ] (π’, π) > π and ππΊ[πΆ] (π’β² , π) > π . Namely, let π»(ππΆ) be a table having one column for each vertex π β ππΆ, and one row for each equivalence class with respect to ππΆ, denoted by [π’]ππΆ . We have: {οΈ ππΊ[πΆ] (π’, π), if 1 β€ ππΊ[πΆ] (π’, π) β€ π π»(ππΆ)[π’, π] = β, otherwise. The third (and last) item of the record represents the key difference to extend the result in [22] to π β₯ 2. Suppose that πΆ is a subset of an π -club πΆ β² of πΊ and that there exist two vertices π’, π’β² β int(πΆ) whose distance in πΊ[πΆ] is larger than π . Then, any path between π’ and π’β² not containing two vertices in ππΆ has length larger than π . Hence, since ππΆ is a separator of πΊ, we have to consider paths in πΊ between π’ and π’β² going through some pair of vertices in ππΆ. We formalize this observation. Let π€, π§ β int(πΆ) be two vertices such that ππΊ[πΆ] (π€, π§) > π . A request for ππΆ, denoted by π π€π§ , is a table having one row and one column for each vertex in ππΆ. Namely, for each π, π β ππΆ, if there exists 2 β€ πΏ β€ π β 2 such that connecting π and π with a path π of length πΏ makes the distance between π€ and π§ to be at most π , then π π€π§ [π, π] = πΏ, while π π€π§ [π, π] = β otherwise. Observe that if there exist two requests π π€π§ and π π€β² π§ β² such that π π€π§ [π, π] = π π€β² π§ β² [π, π] for each pair π, π β ππΆ, then π€ and π€β² are equivalent with respect to ππΆ (i.e., π€, π€β² β [π€]ππΆ ), and the same holds for π§ and π§ β² . Thus, we avoid storing duplicated requests and we denote by π(ππΆ) the set containing all distinct requests for ππΆ. If a potential π -club πΆ is such that ππΆ = β (recall that πΆ β ππ ), then we call it complete. Consider a partitioning π«ππ of πΊπ into potential π -clubs and let πππ = {πΆπ,π π | π β [π ]} be the π potential π -clubs in π«ππ that are not complete. Let ππππ = {ππΆπ,π π | π β [π ]}, π π = {π·(ππΆ π ) | π π π,π π β [ππ ]}, βππ = {π»(ππΆπ,π π ) | π β [π ]}, and π¬π = {π(ππΆ π ) | π β [π ]}. A solution of π is a π π π,π π π tuple πππ = β¨ππππ , πππ , βππ , π¬ππ , πππ β©. Here πππ is an integer, called edge-counter, equal to |πΈπ βπ«ππ (πΈπ )|, hence πππ β€ π. Two solutions πππ = β¨ππππ , πππ , βππ , π¬ππ , πππ β© and πππ = β¨ππππ , πππ , βππ , π¬ππ , πππ β© are distinct if ππππ ΜΈ= ππππ , or πππ ΜΈ= πππ , or βππ ΜΈ= βππ , or π¬ππ ΜΈ= π¬ππ . Observe that if πππ and πππ are not distinct but πππ < πππ , then it suffices to consider only πππ . π(tw2 log π ) Lemma 1 For a bag ππ , there exist π(22 ) distinct solutions. Sketch of the algorithm. Let ππ be the current bag visited by the algorithm. We compute the set of solutions for ππ based on the solutions computed for its child or children. If the resulting set of solutions is empty, the algorithm halts and returns a negative answer. The running time of the algorithm follows from Lemma 1. We only describe the case in which ππ is an introduce bag. The cases in which ππ is a leaf, a forget, or a join bag are omitted in this extended abstract. ππ is an introduce bag. Let ππ = ππ β {π£} be the child of ππ . The algorithm exhaustively extends each solution πππ of ππ as follows. It first generates at most ππ new partitions by placing π£ in each ππΆ β² β ππππ . Also, it generates a partition in which π£ forms a new potential π -club πΆ = ππΆ = {π£}. Consider one of the new partitions generated by the algorithm. In order to build the corresponding new solution for ππ , we distinguish the following two cases. Case A (ππΆ = {π£}). π·(ππΆ) is trivially defined, π»(ππΆ) and π(ππΆ) are empty. Case B (ππΆ = ππΆ β² βͺ {π£}). The next observation immediately follows from the fact that ππΆ = ππΆ β² βͺ {π£} and int(πΆ) = int(πΆ β² ). Observation 1 Suppose that there exist π, π β ππΆ β² such that ππΊ[πΆ β² ] (π, π) > ππΊ[πΆ] (π, π), then any shortest path between π and π in πΊ[πΆ] contains vertex π£. β Computing π·(ππΆ) from π·(ππΆ β² ). 1. We add a new row and a new column for vertex π£. 2. For each vertex π β ππΆ β² , let πΏππ£ = minπβππΊ[π ] (π£) π·(ππΆ β² )[π, π], and note that πΏππ£ = 0 if π edge ππ£ belongs to πΊ[πΆ]. Clearly, it holds that {οΈ β, if πΏππ£ β {π , β} π·(ππΆ)[π, π£] = 1 + πΏππ£ , otherwise. 3. By Observation 1, for each pair π, π β ππΆ β² , the corresponding value of π·(ππΆ) can be updated as follows: π·(ππΆ)[π, π] = min{π·(ππΆ β² )[π, π], π·(ππΆ)[π, π£] + π·(ππΆ)[π, π£]}. β Computing π»(ππΆ) from π»(ππΆ β² ). 1. We add a new column for vertex π£. 2. For each equivalence class [π’]ππΆ β² , let πΏπ’π£ = minπβππΊ[π ] (π£) π»(ππΆ β² )[π’, π]. Since there is π no edge π’π£ such that π’ β int(πΆ), it follows that {οΈ β, if πΏπ’π£ β {π , β} π»(ππΆ)[π’, π£] = 1 + πΏπ’π£ , otherwise. 3. By Observation 1, for each pair of vertices π’ β int(πΆ β² ) and π β ππΆ β² , the corresponding value of π»(ππΆ) can be updated as follows: π»(ππΆ)[π’, π] = min{π»(ππΆ β² )[π’, π], π»(ππΆ)[π’, π£] + π·(ππΆ)[π£, π]}. β Computing π(ππΆ) from π(ππΆ β² ). Note that the addition of π£ cannot lead to new requests but it may actually yield the update of some request in π(ππΆ β² ). 1. For each request π π€π§ in π(ππΆ β² ), we verify whether, as a consequence of the introduction of π£, there exists a cell π π€π§ [π, π] such that π·(ππΆ)[π, π] β€ π π€π§ [π, π]. If such a cell exists, we say that π π€π§ is fulfilled. We add π π€π§ to π(ππΆ) if and only if π π€π§ is not fulfilled. 2. If π π€π§ is not fulfilled, before adding it to π(ππΆ), we update it as follows: a) We add a row and a column for π£. b) For each pair π, π β ππΆ β² , we compute πΏππ = min{(π»(ππΆ)[π€, π] + π»(ππΆ)[π§, π], π»(ππΆ)[π§, π] + π»(ππΆ)[π€, π]}. Observe that πΏππ + π·(ππΆ)[π, π] > π , otherwise the request would have been fulfilled before. c) By definition of request, we have π π€π§ [π, π] = π βπΏππ , if πΏππ < π β1, and π π€π§ [π, π] = β, otherwise. Finally, in both Case A and Case B, we observe that, in order to obtain the edge-counter of the new solution, πππ needs to be increased by the number of edges incident to π£ whose other end-vertex is in ππ but not in πΆ. If the resulting edge-counter is greater than π, the solution is discarded. 3. Discussion and Open Problems We have shown that the π -Club Cluster Edge Deletion problem parameterized by π + tw (where tw bounds the treewidth of the input graph) belongs to FPT. On the other hand, we know that the problem parameterized by π alone is paraNP-hard. It remains open the complexity of π -Club Cluster Edge Deletion parameterized by tw alone. We conclude by remarking that our approach can be slightly modified to solve a related problem, namely π -Club Cluster Vertex Deletion, in which we seek for π vertices whose removal yields a set of disjoint π -clubs. References [1] S. E. Schaeffer, Graph clustering, Comput. Sci. Rev. 1 (2007) 27β64. doi:10.1016/j. cosrev.2007.05.001. [2] A. Ben-Dor, R. Shamir, Z. Yakhini, Clustering gene expression patterns, J. Comput. Biol. 6 (1999) 281β297. doi:10.1089/106652799318274. [3] Z. Wu, R. M. Leahy, An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 15 (1993) 1101β1113. doi:10.1109/34.244673. [4] N. Bansal, A. Blum, S. Chawla, Correlation clustering, Mach. Learn. 56 (2004) 89β113. doi:10.1023/B:MACH.0000033116.57574.95. [5] R. Shamir, R. Sharan, D. Tsur, Cluster graph modification problems, Discret. Appl. Math. 144 (2004) 173β182. doi:10.1016/j.dam.2004.01.007. [6] C. Komusiewicz, J. Uhlmann, Cluster editing with locally bounded modifications, Discret. Appl. Math. 160 (2012) 2259β2270. doi:10.1016/j.dam.2012.05.019. [7] S. BΓΆcker, A golden ratio parameterized algorithm for cluster editing, J. Discrete Algorithms 16 (2012) 79β89. doi:10.1016/j.jda.2012.04.005. [8] J. Chen, J. Meng, A 2k kernel for the cluster editing problem, J. Comput. Syst. Sci. 78 (2012) 211β220. doi:10.1016/j.jcss.2011.04.001. [9] F. V. Fomin, S. Kratsch, M. Pilipczuk, M. Pilipczuk, Y. Villanger, Tight bounds for parame- terized complexity of cluster editing with a small number of clusters, J. Comput. Syst. Sci. 80 (2014) 1430β1447. doi:10.1016/j.jcss.2014.04.015. [10] B. Balasundaram, F. M. Pajouh, Graph Theoretic Clique Relaxations and Applications, Springer New York, 2013, pp. 1559β1598. doi:10.1007/978-1-4419-7997-1\_9. [11] C. Komusiewicz, Multivariate algorithmics for finding cohesive subnetworks, Algorithms 9 (2016) 21. doi:10.3390/a9010021. [12] R. Alba, A graph-theoretic definition of a sociometric clique, Journal of Mathematical Sociology 3 (1973) 113β126. doi:10.1080/0022250X.1973.9989826. [13] B. Balasundaram, S. Butenko, S. Trukhanov, Novel approaches for analyzing biological networks, J. Comb. Optim. 10 (2005) 23β39. doi:10.1007/s10878-005-1857-x. [14] s. Laan, M. Marx, R. Mokken, Close communities in social networks: Boroughs and 2-clubs, SSRN Electronic Journal (2015). doi:10.2139/ssrn.2686127. [15] R. Mokken, Cliques, clubs and clans, Quality & Quantity: International Journal of Methodology 13 (1979) 161β173. doi:10.1007/BF00139635. [16] R. Mokken, S. Laan, Close communication and 2-clubs in corporate networks: Europe 2010, Social Network Analysis and Mining 6 (2016). doi:10.1007/s13278-016-0345-x. [17] H. Liu, P. Zhang, D. Zhu, On editing graphs into 2-club clusters, in: FAW-AAIM, volume 7285 of LNCS, Springer, 2012, pp. 235β246. doi:10.1007/978-3-642-29700-7\_22. [18] F. N. Abu-Khzam, N. Makarem, M. Shehab, An improved fixed-parameter algorithm for 2-club cluster edge deletion, CoRR abs/2107.01133 (2021). URL: https://arxiv.org/abs/2107. 01133. [19] N. Misra, F. Panolan, S. Saurabh, Subexponential algorithm for d-cluster edge deletion: Exception or rule?, J. Comput. Syst. Sci. 113 (2020) 150β162. doi:10.1016/j.jcss.2020. 05.008. [20] R. G. Downey, M. R. Fellows, Parameterized Complexity, Monographs in Computer Science, Springer, 1999. [21] N. Robertson, P. D. Seymour, Graph minors. II. Algorithmic aspects of tree-width, J. Algorithms 7 (1986) 309β322. [22] R. Dondi, M. Lafond, On the tractability of covering a graph with 2-clubs, in: L. A. Gasieniec, J. Jansson, C. Levcopoulos (Eds.), FCT 2019, volume 11651 of LNCS, Springer, 2019, pp. 243β257. doi:10.1007/978-3-030-25027-0\_17. [23] F. Montecchiani, G. Ortali, T. Piselli, A. Tappini, On the parameterized complexity of the π -club cluster edge deletion problem, CoRR abs/2205.10834 (2022). doi:10.48550/arXiv. 2205.10834. [24] H. L. Bodlaender, T. Kloks, Efficient and constructive algorithms for the pathwidth and treewidth of graphs, J. Algorithms 21 (1996) 358β402. doi:10.1006/jagm.1996.0049. [25] T. Kloks, Treewidth, Computations and Approximations, volume 842 of LNCS, Springer, 1994.