Sub-optimal Recall in Visual Cluster Retrieval: When
                                Clusters Look Like Bridges
                                Mathieu Jacomy1,∗ , Matilde Ficozzi1 and Anders K. Munk2
                                1
                                    Aalborg Universitet, Denmark
                                2
                                    Danmarks Tekniske Universitet, Denmark


                                              Abstract
                                              Force-directed node placement algorithms, a popular technique to visualise networks, are known to op-
                                              timize “cluster separability”: when sets of densely connected nodes get represented as well-separated
                                              groups of dots. Using these techniques leads us to conceive networks as sets of clusters connected by
                                              bridges. This is also how we tend to think of the “community structure” model embedded in clustering
                                              techniques like modularity maximization. Yet this mental model has flaws. We specifically address the
                                              notion that clusters (“communities”) necessarily look like groups of dots, through the mediation of a
                                              node placement algorithm. Although often true, we provide a reproducible counterexample: topolog-
                                              ical clusters that look like bridges. First, we present an empirical case that we encountered in a real
                                              world situation, while mapping the academic landscape of AI and algorithms. Second, we show how to
                                              generate a network of arbitrary size where a cluster looks like a bridge. In conclusion, we open a dis-
                                              cussion about layout algorithms as a visual mediation of a network’s community structure. We contend
                                              that when it comes to the accuracy of retrieving clusters visually, node placement algorithms have an
                                              imperfect recall despite an excellent precision.

                                              Keywords
                                              human-centered computing, graph drawing, network visualization, community detection, visual cluster
                                              retrieval, visual network analysis


                                1. Introduction
                                The purpose of drawing a large graph (a.k.a. making a network map) is generally to medi-
                                ate its topological features. One aims to produce visual patterns providing insights about its
                                community structure [22]. The expected visual pattern mainly consists of visually separated
                                aggregates of dots, that one interprets as clusters or communities, i.e. densely connected sets
                                of nodes. In practice, one rarely sees a cluster if it does not exist in the topology (although we
                                do not assess that statement here). And one tends to assume that conversely, if a cluster exists
                                in the graph topology, it is necessarily visible in the layout. In this paper, we argue that this
                                assumption can be wrong in common situations. We find that a common topological cluster
                                pattern could often get mediated differently from the compact aggregate of dots one generally
                                expects, for instance as a bridge (Figure 1). In short, some clusters can be dense enough to be
                                consistently picked up by common clustering techniques, yet not dense enough to be displayed

                                CHR 2024: Computational Humanities Research Conference, December 4–6, 2024, Aarhus, Denmark
                                ∗
                                 Corresponding author.
                                £ mathieu.jacomy@gmail.com (M. Jacomy); matildefic@ikl.aau.dk (M. Ficozzi); ankm@dtu.dk (A. K. Munk)
                                ȉ 0000-0002-6417-6895 (M. Jacomy); 0009-0001-8660-5734 (M. Ficozzi); 0000-0002-5542-3065 (A. K. Munk)
                                            © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


                                                                                                            1075
CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Figure 1: A network with a bridge-like cluster generated by our method. Node colors represent clusters
detected by clique percolation (top) and modularity maximization by the Leiden method (bottom). The
clusters on the side are denser (cliques) than the bridge-like cluster (a “stretching”, see section 3).


as visual aggregates by common layout techniques.
   In section 2 we report our observations from a real-world situation where we find such
clusters in a co-occurrence network about AI and algorithms in science. In section 3 we draw on
those observation to build a procedure to generate clusters that look like bridges. We conclude
by framing this issue as an accuracy problem for the task of retrieving clusters visually, more
specifically as a recall issue.

1.1. Related work
If community detection techniques like modularity maximization [3, 12, 21] have become a
staple of visual network analysis, it is notably because the retrieved communities closely match
the visual clusters produced by force-driven algorithms. This result was voiced by Noack as
“modularity clustering is a force-directed layout” [14].
   Noack formalized graph drawing as an optimization problem about the “separation of com-
munities” [14]. In this mental model, graphs are imagined as communities connected by
bridges. On a practical level, using force-directed layouts and community detection algorithms
to analyze networks reinforces the notion that a network’s community structure solely consists
of clusters connected by bridges (Figure 2). Indeed, the visualizations produced in popular tools
like Cytoscape [19] Gephi [1] or NodeXL [20] often feature separated groups of nodes (seen as
clusters) connected by edges or chains of edges (bridges). In this context, it is most reasonable
to assume that a node is either part of a cluster, of a bridge, or neither.
   In this implicit mental model, the macro-structure is nested and graph-like: clusters play
the role of nodes, and bridges that of edges. Yet as it turns out, relevant patterns in graphs are
not limited to the cluster/bridge dichotomy. We generally consider bridges as made of nodes
[10, 18], sometimes made of edges [7], but rarely as a mix of both; yet such structures can
effectively be an important part of the community structure. Intuitively, very dense clusters
may be connected by less dense clusters.
   The notion of community structure has multiple definitions [16], meanings [6], computa-
tional commitments [5, 17] and visual representations [2, 22]. In the next section, we report on


                                                1076
Figure 2: Implicit mental model of modular graphs


a real-world case where the community structure has richer patterns than the cluster/bridge
dichotomy. It shows that the nuance and complexity of a network’s structure can be lost in
the translation enacted by the layout, notably when some “communities” get represented as
visually non-compact shapes.


2. Case: a co-occurrence network of keywords from articles
   about AI and algorithms
Description of the case: We generated this network by extracting expressions co-occurring
in the abstract or title of academic articles mentioning AI, algorithms, or machine learning. Our
motivation was to map what algorithms were doing in science. We present our full methodol-
ogy and our statistical analysis of the corpus in another paper [11].
   This network’s 7,562 nodes represent the most frequent expressions in our corpus, such as
“encrypted”, “image denoising”, or “CNNs”. Each of the 85,215 edges represents a co-occurrence
of connected expressions with a sufÏciently high pointwise mutual information (PMI) score [4].
Through clique percolation [15] with 𝑘 = 7 we uncovered 166 clusters corresponding to various
topics or semantic fields. We rendered the graph as a network map using the layout algorithm
LinLog [13] using the Force Atlas 2 implementation [9]. In order to annotate the network
map and use it as a discussion elicitation device with our project partners, we underwent the
following task: for each cluster, sample the most representative documents, read them, and
produce a short summary of the purpose and agency of AI and algorithms in those publications
[11].
   The network obtained has a blatant community structure, and at first glance, seems to consist
of clusters connected by bridges (Figure 3).

Issue at stake: Our annotation task demanded us to explain visible clusters, as rendered by
the layout algorithm, while we had computed topological clusters through clique percolation.
To our surprise, we faced many mismatches: some computed clusters did not look like groups
of dots in the network map, or conversely, some clearly visible clusters were not captured by
the clique percolation process.
   One may think that when there is a mismatch between computed and visualized clusters, the
computation should always take precedence over the visual, because it is more true to the data


                                             1077
Figure 3: Screenshot of the network in Gephi [1].


(then the mismatch would not be an issue). But that perspective fails to account for the purpose
of annotation, which is to provide a context for the visible patterns. One cannot annotate a
pattern that is not visible, even if it exists in the data; and conversely, one should annotate the
artifacts of the method, i.e. anti-patterns that are visible but do not exist in the data (precisely
to mark them as artifacts).
   As soon as a visualization is used, even for exploratory data analysis, accounting for what it
makes visible becomes a methodological necessity.

Sub-issue 1: granularity level. To retrieve clusters we used clique percolation as we did not
need to annotate the entire network (only dense areas). We picked a clique size of 𝑘 = 7 because
it was low enough to capture the smallest relevant clusters. In many cases, this approach
worked: the cluster retrieved looked like what one expects (Figure 4).
   However, this approach had a significant drawback: 𝑘 was too low to break the bigger
and denser clusters down to subclusters (Figure 5). In the most extreme case, one of the re-
trieved clusters contained the 31% of the nodes (the entire semantic field related to health and
medicine).
   Large computed clusters were an issue because they contained multiple smaller visual clus-
ters. However, this issue was easy to address by subdividing them until we reached the desired
granularity. We chose the modularity maximization by the Leiden method [21] for the conve-
nience of its resolution setting.


                                               1078
Figure 4: A well-behaved cluster. This cluster from clique percolation looks like a well-separated
group of dots. Top: the cluster as shown in Gephi (in green). Bottom: how we annotated it in the final
map.


Figure 5: A recalcitrant cluster (“renewable energy storage”, partial screenshot).


Sub-issue 2: clusters not looking like clusters. Some clusters were not only too big, but
also contained non-compact structures, such as bridges. We could not solve that issue, and had
to live with the trouble. For instance, the recalcitrant cluster of Figure 5 seemed to contain a
mix of clusters and bridges, which appears more clearly if we isolate it from the rest (Figure 6-
A). It is not compact, but extremely spiky. Yet from the method, its topology satisfies the same
criterion as the well-behaved cluster of Figure 4: each node has at least 6 fully interconnected
neighbors in the cluster.
   The spikiness is partly due to how this cluster is interconnected with the rest of the map,


                                                1079
Figure 6: The recalcitrant cluster from Figure 5 (“renewable energy storage”) isolated and in full view.
A: The cluster with the map’s original layout but without the other nodes. B: After the layout is applied
again.


and partly due to its internal structure. We can observe it by applying the same layout to the
isolated network (Figure 6-B): although the layout is less spiky and more compact, it remains
very elongated. Even when we repeat this process recursively, many clusters keep resisting
and refuse to become compact unless we tear them down to confetti (Figure 7).
   We had to accept that some clusters are dense enough to be captured by community detection
(both clique percolation and modularity clustering) yet not dense enough to look like compact
shapes on the map. Many of those clusters looked like bridges, stretching between compact
clusters. We ultimately decided to annotate the compact clusters and the bridge clusters the
same way, the only difference being that compact clusters got attached to a landmark point
while the title of bridge clusters followed the contour of the connection (Figure 8).

Takeaways from the case. By splitting initial clusters to the desired granularity, we settled
on 235 sub-clusters to annotate. Of those, 58 were of the bridge kind (25%): at least in this case,
bridge-looking clusters are not a marginal phenomenon.
   We also observed that those bridges were straightforward to explain. The main bridge in
our recalcitrant cluster was about renewable energy storage, and it connected the topics of re-
newable energy and power distribution with those of smart grids and lithium-ion batteries, which
makes a lot of sense. Other examples included scanning for cancer bridging scanners with cancer
research; or language bridging text analysis with speech recognition; etc.


                                                 1080
Figure 7: Recalcitrant cluster from Figure 5 broken down recursively: it never gets visually compact.


Figure 8: Recalcitrant cluster in the final map (under the label “renewable energy storage”).


3. Generating clusters that look like bridges
Drawing on these observations, we devised a method to generate clusters that look like bridges.
An example is shown in Figure 1.


                                                1081
Criteria satisfied: The method works for any clique size 𝑘 set in advance. It generates an
arbitrarily large graph such that clique percolation using said 𝑘 will find exactly 3 clusters, one
of which will look like a bridge (it get represented as a visually non-compact set of dots spread
out between the two other clusters).
   Our tests show that modularity clustering by the Louvain and Leiden methods will also find
3 clusters, although it depends on the resolution used (we cannot guarantee the result as those
algorithms are not deterministic). However, the result for clique percolation is granted by
design of the method.

Method: Generate two cliques of a size significantly greater than 𝑘. Generate a “stretching”
[8] by stacking as many cliques of size 𝑘 as desired, as shown in Figure 9. Merge 𝑘 − 2 nodes
from the first clique with as many of the first stacked clique of the stretching; similarly merge
𝑘 − 2 nodes of the second clique with the last stacked clique of the stretching. More details in
the reference implementation (Python).


Figure 9: Stretching: stacked cliques.


   In the example of Figure 1, we used 𝑘 = 100; the cliques had 500 nodes and the stretching
stacked 400 cliques.

Why it works: Each clique will be retrieved by clique percolation because its size is greater
than 𝑘. The stretching will be retrieved because it satisfies clique percolation by construction.
However, the merge of 𝑘 − 2 creates a bottleneck that prevents clique percolation from joining
the stretching with either clique, because it is smaller than the minimum overlap of 𝑘 − 1 nodes
required.
   The stretching will look like a bridge for the same two reasons we mentioned in our obser-
vations. First, it is less compact than a clique by design; in fact, it is just compact enough to
satisfy the criterion of clique percolation at the decided level 𝑘. Second, it gets pulled in two
opposite directions by the cliques, whose mutual repulsion is dominating the balance of forces
in the layout.


                                              1082
4. Bridge-like clusters as an accuracy problem for visual cluster
   retrieval
The situation can be reformulated as an accuracy problem for the task of retrieving clusters
visually, which is the purpose of force-directed layout algorithms as formalized by their authors
(for instance [13, 9]; see also [22]). The clusters we see but cannot retrieve by computational
means are the false positives, responsible for the task’s precision, which we assume as generally
good (or at least, it is not the focus of this paper). Conversely, the clusters that do not look like
a compact group of dots and remain undetected are the false negatives responsible for the recall
(Figure 10).


Figure 10: Accuracy for visual cluster retrieval: bridge-like clusters create a recall problem.


5. Conclusion: visual cluster retrieval has a sub-optimal recall
We presented an empirical case where clusters retrieved from modularity clustering did not
always look like the compact groups of dots we usually expect. Building on our observations,
we built a method to generate clusters that look like bridges: satisfying the criterion of clique
percolation for an arbitrary clique size, yet looking like bridges stretched between compact
clusters when visualized by a force-directed layout. We do not provide evidence about the
pervasiveness of these visually non-compact clusters, but our generation method shows that
the conditions for their emergence are commonplace in the context of large graphs with a
community structure.
   The existence of bridge-like clusters creates a recall issue for the task of retrieving clusters
visually from a force-directed layout, even when the precision is good: unbeknownst to the
visualization’s reader, some clusters may remain unseen. Being aware of that possibility is an
important insight for the researchers and experts using network maps to explore relational
data.


                                                 1083
References
 [1] M. Bastian, S. Heymann, and M. Jacomy. “Gephi: an open source software for exploring
     and manipulating networks”. In: Proceedings of the international AAAI conference on web
     and social media. Vol. 3. 2009, pp. 361–362.
 [2] C. Bennett, J. Ryall, L. Spalteholz, and A. Gooch. “The aesthetics of graph visualization.”
     In: CAe. 2007, pp. 57–64.
 [3] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. “Fast unfolding of com-
     munities in large networks”. In: Journal of statistical mechanics: theory and experiment
     2008.10 (2008), P10008.
 [4] G. Bouma. “Normalized (pointwise) mutual information in collocation extraction”. In:
     Proceedings of GSCL 30 (2009), pp. 31–40.
 [5] S. Fortunato. “Community detection in graphs”. In: Physics reports 486.3-5 (2010), pp. 75–
     174.
 [6] L. C. Freeman. “The sociological concept of” group”: An empirical test of two models”.
     In: American journal of sociology 98.1 (1992), pp. 152–166.
 [7] L. HAO, X. Li, X. Liu, and W. Wang. “BBTA: Detecting communities incrementally from
     dynamic networks based on tracking of backbones and bridges”. In: Applied Intelligence
     53 (2022). doi: 10.1007/s10489-022-03418-2.
 [8] M. Jacomy. “Situating Visual Network Analysis”. PhD thesis. Aalborg University, 2021.
     doi: 10.54337/aau435977255.
 [9] M. Jacomy, T. Venturini, S. Heymann, and M. Bastian. “ForceAtlas2, a continuous graph
     layout algorithm for handy network visualization designed for the Gephi software”. In:
     PloS one 9.6 (2014), e98679.
[10]   N. Meghanathan. “Neighborhood-based bridge node centrality tuple for complex net-
       work analysis”. In: Applied Network Science 6 (2021). url: https://api.semanticscholar.or
       g/CorpusID:235691166.
[11]   A. K. Munk, M. Jacomy, M. Ficozzi, and T. E. Jensen. “Beyond artificial intelligence con-
       troversies: What are algorithms doing in the scientific literature?” In: Big Data & Society
       11.3 (2024), p. 20539517241255107. doi: 10.1177/20539517241255107.
[12]   M. E. Newman and M. Girvan. “Finding and evaluating community structure in net-
       works”. In: Physical review E 69.2 (2004), p. 026113.
[13]   A. Noack. “Energy models for graph clustering.” In: J. Graph Algorithms Appl. 11.2 (2007),
       pp. 453–480.
[14]   A. Noack. “Modularity clustering is force-directed layout”. In: Physical review E 79.2
       (2009), p. 026102.
[15]   G. Palla, I. Derényi, I. Farkas, and T. Vicsek. “Uncovering the overlapping community
       structure of complex networks in nature and society”. In: nature 435.7043 (2005), pp. 814–
       818.


                                              1084
[16]   L. Peel, D. B. Larremore, and A. Clauset. “The ground truth about metadata and commu-
       nity detection in networks”. In: Science Advances 3.5 (2017), e1602548. doi: 10.1126/scia
       dv.1602548.
[17]   T. P. Peixoto. Descriptive vs. inferential community detection in networks: Pitfalls, myths
       and half-truths. Cambridge University Press, 2023.
[18]   A. N. Rabchevskiy, V. S. Zayakin, and E. A. Rabchevskiy. “A Method for Identifying
       Bridges in Online Social Networks”. In: Recent Trends in Analysis of Images, Social Net-
       works and Texts - 10th International Conference, AIST 2021, Tbilisi, Georgia, December 16-
       18, 2021, Revised Supplementary Proceedings. Ed. by E. Burnaev, D. I. Ignatov, S. I. 0002,
       M. Y. Khachay, O. Koltsova, A. Kutuzov, S. O. Kuznetsov, N. V. Loukachevitch, A. Napoli,
       A. Panchenko, P. M. Pardalos, J. Saramäki, A. V. Savchenko, E. Tsymbalov, and E. Tu-
       tubalina. Vol. 1573. Communications in Computer and Information Science. Springer,
       2021, pp. 166–175. doi: 10.1007/978-3-031-15168-2\_14.
[19]   P. Shannon, A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, D. Ramage, N. Amin, B.
       Schwikowski, and T. Ideker. “Cytoscape: a software environment for integrated mod-
       els of biomolecular interaction networks”. In: Genome research 13.11 (2003), pp. 2498–
       2504.
[20]   M. Smith, N. Milic-Frayling, B. Shneiderman, E. Mendes Rodrigues, J. Leskovec, and C.
       Dunne. NodeXL: a free and open network overview, discovery and exploration add-in for
       Excel 2007/2010. 2010.
[21]   V. A. Traag, L. Waltman, and N. J. Van Eck. “From Louvain to Leiden: guaranteeing well-
       connected communities”. In: Scientific reports 9.1 (2019), p. 5233.
[22]   T. Venturini, M. Jacomy, and P. Jensen. “What do we see when we look at networks:
       Visual network analysis, relational ambiguity, and force-directed layouts”. In: Big Data
       & Society 8.1 (2021), p. 20539517211018488.


                                              1085