1. Introduction

A Simple Heuristic for the Graph Tukey Depth Problem with Potential Applications to Graph Mining

Florian Seifarth

Tamás Horváth

0 1 2

Stefan Wrobel

0 1 2 0 Dept. of Computer Science, University of Bonn , Bonn , Germany 1 Fraunhofer Center for Machine Learning , Sankt Augustin , Germany 2 Fraunhofer IAIS , Schloss Birlinghoven, Sankt Augustin , Germany

We study a recently introduced adaptation of Tukey depth to graphs and discuss its algorithmic properties and potential applications to mining and learning with graphs. In particular, since it is NP-hard to compute the Tukey depth of a node, as a first contribution we provide a simple heuristic based on maximal closed set separation in graphs and show empirically on diferent graph datasets that its approximation error is small. Our second contribution is concerned with geodesic core-periphery decompositions of graphs. We show empirically that the geodesic core of a graph consists of those nodes that have a high Tukey depth. This information allows for a parameterized deterministic definition of the geodesic core of a graph.

eol>graph Tukey depth geodesic closure closed set separations geodesic core-periphery decomposition

1. Introduction

1 I C N 9 _ C R S M L E D L I O C

Degree Cen. Closeness Cen. Betweenness Cen. Tukey Depth small large

machine learning approaches that are based on hyper-plane separations [6, 7]. For R and in other more general metric spaces [8], Tukey depth has been studied in the context of machine learning, in particular, for object classification [ 9]. In case of learning linear classifiers, it is shown in [10] that the Tukey median, i.e., the points with the highest Tukey depth, are related to the Bayes point.

An important advantage of Tukey depth over other centrality measures is that the exact geometric position of the points in R is not relevant for their depth. Thus, the Tukey depth is relatively stable regarding outliers and is therefore used for outlier detection [11, 12]. Moreover, as Tukey depth relies on separations, it is possible to adapt it to other domains enabling (abstract) hyper-plane or half-space separation. In particular, this property of the original Tukey depth is utilized in its adaptation to geodesic closure systems over graphs [ 3 ].

Similarly to the fact that the computation of the Tukey depth in R is NP-hard [13], it is also NP-hard to compute the graph Tukey depth of a node [ 3 ]. Motivated by this negative result, one of our main contributions is a heuristic algorithm for approximating graph Tukey depth. It runs in time polynomial in the size of the input graph and approximates the Tukey depth of a node with one-sided error by overestimating it. Our experimental results with small graphs clearly demonstrate that the approximation is close to the exact Tukey depth, by noting that for larger graphs we were not able to evaluate the approximation performance of our algorithm, as it was not possible to calculate the exact Tukey depth in practically feasible time.

Our heuristic is based on our greedy algorithm designed in [14] for solving the more general maximal closed set separation (MCSS) problem. In the particular case of graphs, the problem is as follows: Given two subsets , of the node set, find, if they exist, two (inclusion) maximal disjoint geodesically closed node sets that separate and from each other. As maximal disjoint closed sets provide a separation of sets, it is, therefore, natural to ask the following question: Is there a connection between graph Tukey depth and node separation with geodesically closed node sets? We give an afirmative answer to this question by showing experimentally on small graph datasets that the solutions of the MCSS problem provide a good approximation quality of the exact Tukey depth. This mainly relies on the following fact: For any closed set containing at least one node of high Tukey depth, there exists no large disjoint closed set.

It follows from the definition of graph Tukey depth that it is related to other concepts based on geodesic convexity. One of these notions is the recent probabilistic definition of geodesic core-periphery decomposition of graphs. It was introduced in [15] and studied in [15, 16, 17, 18]. Hence, our second question is concerned with the following problem: Is there a connection between graph Tukey depth and geodesic core-periphery decompositions? The geodesic coreperiphery decomposition breaks up some types of graphs (including interaction graphs, such as social networks) into a dense core and a sparse “surrounding” periphery [15] (see Fig. 3 for a visual example). While some graphs (e.g., Erdős-Rényi, Barabási-Albert, and Watts-Strogatz random graphs) seem to have no periphery, others (e.g., trees and fully connected graphs) seem to have no core. Since this behavior is not well-understood up to now, we try to understand it by studying the nodes’ Tukey depth. It seems that the geodesic core of a graph consists of those nodes that are of high Tukey depth (see Fig. 4 for some examples). This observation allows for a parameterized deterministic definition of the cores. That is, the core of a graph can be defined by the set of those nodes that have a Tukey depth greater than a user-specified threshold. Our empirical results clearly demonstrate that using the right threshold, the probabilistic definition of cores in [15] coincides with our deterministic one.

The rest of the paper is organized as follows. In Sec. 2 we first collect the necessary notions and notation. Sec. 3 contains some examples showing that graph Tukey depth is strongly related to such mining and learning algorithms on graphs that rely on graph geodesic convexity. In Sec. 4 we present our heuristic for approximating graph Tukey depth and evaluate it empirically on small graph datasets. Finally, in Sec. 5 we mention some open questions for future research.

2. Preliminaries

In this section, we collect the necessary notions and fix the notation. For a graph = (, ), () and () denote the set of nodes and the set of edges. Furthermore and denotes | ()| and |()|, respectively. By graphs, we always mean undirected graphs without loops and multi-edges. For any , ∈ (), the (geodesic) interval [, ] is the set of all nodes on all shortest paths between and (see Fig. 2a for an example). A set of nodes ⊆ () is called (geodesically) closed if for all , ∈ (), , ∈ implies [, ] ⊆ . The closure () of a

(a)

(b) set ⊆ () is the smallest closed set containing (see Fig. 2b for an example of ({, })). The Graph Tukey Depth Problem For a graph , the Tukey depth of a node ∈ () is defined as follows [ 3 ]: Let ⊆ () be a closed set of maximum cardinality such that ∈/ . Then the Tukey depth of is defined by td() := | ()| − | |. That is, has a high Tukey depth if all closed sets not containing are of small cardinality. We are interested in the Graph Tukey depths problem defined as follows: Given a graph compute td() for all ∈ (). Geodesic Cores Geodesic cores have been defined in [ 15] in a probabilistic way. Informally, the geodesic core of a graph consists of those nodes which are contained in most geodesic closed sets that are generated by a small number of random nodes. Of course, the core defined in this way can be empty. However, it turns out that this is not the case for interaction graphs (e.g. social networks) [15]. We define the geodesic core of a graph , denoted by as follows. Let 1, 2, . . . be a sequence of sets, where each set consists of > 0 nodes selected independently and uniformly at random from (). Then = ⋂︀ =1 ( ) is the core of , where is the smallest integer satisfying ⋂︀ =1 ( ). Obviously, this definition is not =1 ( ) = ⋂︀+1 deterministic since diferent choices of and can lead to diferent cores. Nevertheless, the experiments in [16] with large real-world networks show that for ≈ 10, the core (if it exists) does not depend on the particular choice of the generator elements. The core-periphery decomposition of a graph is composed of the subgraph induced by the core nodes and that by the remaining nodes, called periphery. In Fig. 3 we give a visual example of the core-periphery decomposition of the CA-GrQc network [19]1.

MCSS problem for Graphs We will use the maximal closed set separation (MCSS) problem restricted to geodesic closures over graphs (see [14] for the generic definition): Given a graph = (, ) and , ⊆ () with disjoint geodesic closures, find two geodesic closed sets ′, ′ ⊆ () with ′ ∩ ′ = ∅ and ⊆ ′, ⊆ ′ that are maximal, i.e., there exist no proper supersets of ′ resp. ′ that fulfill the above properties.

1This network is built by the co-authorships in the general relativity and quantum cosmology community. (a) Entire Network (b) (Geodesic) Core (c) Periphery 3. Potential Applications to Mining and Learning with Graphs Before we turn to the algorithmic aspects of the graph Tukey depth problem, in this section we ifrst present some of its potential applications to mining and learning with graphs. In particular, this section deals with some connections of graph Tukey depth to node separations and to geodesic core-periphery decompositions. We state three important properties of Tukey depth. In particular, Proposition 1 clarifies the role of Tukey depth in the context of geodesic closed sets. Proposition 1. Let be a graph, ∈ () with td() = − , and ⊆ () a geodesically closed node set with || > . Then ∈ .

Proposition 2 ([ 3 ]). Let be a graph, ⊆ (), and be the closure of . Then the graph Tukey depth is a quasi-concave function, i.e., for all ∈ we have () ≥ min{() : ∈ } . Proposition 3 ([ 3 ]). Let be a graph, ∈ N, and = { ∈ () : td() ≥ }. Then is geodesically closed.

To underline the importance of these three statements, we give two examples that show how they (can) influence machine learning and data mining methods based on geodesic closures. Example 1: Node Classification and Active Learning In [14, 20, 21, 22], disjoint halfspaces and closed sets are used for binary classification in closure systems, for node classification, and active learning in graphs using geodesic convexity. Given the Tukey depth td() of a node , Proposition 1 immediately implies that a separating half-space or closed set not containing cannot have a cardinality greater than − td(). Thus, for nodes of high Tukey depth, there is no large geodesic closed set not containing them. Hence, Proposition 1 implies a nice theoretical connection between graph Tukey depth and the maximum size of separating half-spaces and closed sets. Using approximate graph Tukey depths, the predictive performance of all the above methods can possibly be improved.

h t p e D y e k u

T . eodG reoC small depths large depths Example 2: Geodesic core-periphery decomposition The geodesic core-periphery decomposition of graphs was analyzed in [15, 16, 17]. In particular, it was found in [15, 17] that many social networks consist of a dense geodesic core “surrounded” by a sparse periphery (see Fig. 3 for an example). While some graphs, especially tree-like graphs, seem to have no core, others, such as graphs sampled from random graph models (e.g., Erdős-Rényi, Barabási-Albert and Watts-Strogatz) seem to have no periphery. Moreover, the closure of a small number of randomly chosen graph nodes (≈ 10) always contains the geodesic core (if it exists). Furthermore, if the nodes are sampled from the geodesic core only, then the closure of the nodes is the geodesic core itself. If we compute the closure of, say, 10 randomly chosen nodes from the entire network (Fig. 3a), then the closure always contains the core (orange nodes in Fig. 3b). If all random nodes belong to the core (orange nodes in Fig. 3a), then their closure is the core itself. The above statements explain this behavior. Using that the core is always contained in the closure of a small number of randomly chosen nodes, from Proposition 2 it follows that the nodes in the core are those with the highest Tukey depths. Moreover, the quasi-concave property of graph Tukey depth implies that if the core is generated by a few nodes from the core, then the core nodes must have a very close Tukey depth. Finally, using Proposition 3, we have that the set of nodes in a graph with a Tukey depth above some threshold is always geodesically closed; cores arise as a special case of this property. These three properties motivate the following deterministic definition of geodesic cores: Definition 1. The -geodesic core of a graph is defined by

:= { ∈ () : td() ≥ } .

To empirically confirm our claim that the core contains the nodes with the highest Tukey depths, consider the three graphs in Fig. 4. We computed the exact Tukey depths (top) and their geodesic cores (bottom). For the Karate Club network (left) considered also in [ 3 ], the core exactly matches the set of nodes of the highest Tukey depth. Furthermore, there is not much lfuctuation in the depths of the core nodes. In fact, all nodes of Tukey depth of at most 3 belong to the periphery and all nodes of Tukey depth 19 or 21 to the core, by noting that there are no nodes of Tukey depth between 4 and 18. In case of the Les Miserables character network (middle), there is only a single node with a very high Tukey depth of 57, surrounded by nodes of depth less than 35. In this case, the core algorithm returns only the node with the highest Tukey depth, showing that the graph Tukey depth can possibly be used to improve core-periphery decompositions. This is also the case for the Dolphin community graph (right), where the core consists of nodes with Tukey depth greater than 2, while all nodes in the periphery have a Tukey depth of at most 2.

4. Approximating the Tukey Depth

Motivated by the negative complexity result concerning the calculation of Tukey depth, in Sec. 4.1 below we first propose a heuristic based on the algorithm in [14] that solves the MCSS problem. It approximates the nodes’ Tukey depths with one-sided error. We show experimentally on diferent types of small graphs that the results obtained by our heuristic are fairly close to the exact ones. Furthermore, even on small graphs, our algorithm is up to 200 times faster than the exact one (Sec. 4.2). It is important to emphasize that we had to resort to such graphs, as it was not possible to calculate the exact Tukey depths for larger graphs in a practically feasible time.

4.1. The Heuristic

Recall that the exact Tukey depth of a node is defined by td() := | ()| − | |, where || is the maximum cardinality of a closed set not containing . It can be computed exactly using an integer linear program (see [ 3 ] for the details). The computationally hard part of the problem is to find a closed set of maximum size. Our heuristic addresses this problem by considering an inclusion maximal closed set only, instead of a maximum cardinality closed set. This relaxation, which distorts of course the exact value of Tukey depth, allows us to apply the eficient greedy algorithm proposed in [14] for solving the MCSS problem. In what follows, for any ∈ (), t̂d︀() denotes the approximation of td() obtained with our heuristic.

Given a graph , the rough idea to approximate the Tukey depth of a node ∈ () is to find an inclusion maximal geodesically closed set ⊆ () with ∈/ . Such a set can be found by applying the MCSS algorithm [14] with input {} and {′}. Then the output of the algorithm is a solution of the MCSS problem, i.e., it consists of two closed node sets , ′ ⊆ () with ∈ and ′ ∈ ′ that are disjoint and inclusion maximal. That is, there exist no proper closed supersets of , ′ with the same properties. The Tukey depth can then be approximated using the cardinalities of resp. ′ . For a fixed node , the result depends on the particular choice of ′. To improve the approximation quality, we therefore call the MCSS algorithm for each node several times, with diferent nodes ′ ̸= .

The pseudo-code of the above heuristic is given in Alg. 1. In Line 1 we initialize the Tukey depth of all nodes in by setting them to the maximum possible value, i.e., to | ()|. We repeat the procedure described above for all nodes ∈ () and all their neighbors ′ ∈ Γ() (see the Algorithm 1: Approximation of Graph Tukey Depth

Input : graph

Output : approximation t̂d︀() of td() for all ∈ () 1 t̂d︀() − ← | ()| for all ∈ (); 2 forall ∈ () do 3 forall ′ ∈ Γ() do 4 ′ , = ({′}, {}); 5 forall ∈ () do 6 if ̸∈ ′ then 7 t̂d︀() = min{t̂d︀(), | ()| − | ′ |}; 8 if ̸∈ then 9 t̂d︀() = min{t̂d︀(), | ()| − | |}; 10 return t̂d︀() for all ∈ () for-loops in Line 2 and 3). In this way we solve the MCSS problem for all input sets {}, {′}, i.e., separate from all of its neighbors ′ by maximal disjoint closed sets , ′ (see Line 4). Note that the Tukey depth of a node is based on a closed set of maximum cardinality not containing . Thus, if does not lie in the set (resp. ′ ), then the cardinality of (resp. ′ ) is smaller than or equal to a closed set of maximum cardinality not containing . Hence, we can update the current Tukey depth approximation of all nodes ∈ () as follows: Take the minimum over the old and the new approximation which is the cardinality of () ∖ if ∈/ or that of () ∖ ′ if ∈/ ′ (see Line 7 and Line 9).

By construction, Alg. 1 finds only maximal and not maximum closed sets, resulting in a one-sided error in the estimate of Tukey depths. This is formulated in the proposition below. Proposition 4. Alg. 1 overestimates the Tukey depth, i.e., for the output t̂d︀() returned by Alg. 1 we have t̂d︀() ≥ td(), for all ∈ ().

Regarding the runtime of Alg. 1, note that the inner part of the loop in Lines 3-9 is executed () times because we iterate over all neighbors (i.e., all edges are considered twice). The runtime of the inner loop (Lines 3–9) is dominated by the MCSS algorithm called in Line 4, which calls the closure operator at most () times [14]. Since the geodesic closure can be computed in time () [26], we have the following result for the total runtime of Alg. 1: Proposition 5. Alg. 1 returns an upper bound of the Tukey depth for all nodes of in (22) time.

The runtime of the approximation algorithm can be improved by considering for each node a fixed number of distinct nodes ′, or by considering a fixed subset ⊆ (), instead of the whole node set () in the outer loop (see Lines 2–9). It is left to further research to analyze how these changes afect the quality of the approximation performance.

4.2. Experimental Evaluation

In this section we empirically evaluate the approximation quality and runtime of Alg. 1 on datasets containing small graphs2. Regarding the approximation quality, we compare the results obtained by our heuristic algorithm to the exact Tukey depths computed with the algorithm in [ 3 ]. For the evaluation, we consider 19 graph datasets of diferent types (small molecules, small graphs from bioinformatics and computer vision, and small social networks) from [ 2 ] (see columns 2–4 of Tab. 1 for the number of graphs and their average number of nodes and edges). The average size of the graphs ranges from 14 (PTC_MM) up to 82 (OHSU ); their average edge numbers from 14 to 200. The reason for considering small graphs only is that the exact algorithm [ 3 ] was unable to calculate the Tukey depth for larger graphs within one day (see the last two columns of Tab. 1). For practical reasons, we removed all disconnected graphs from the original datasets, by noting that our heuristic works for disconnected graphs as well.

The results are presented in Tab. 1. It contains the approximation qualities measured in diferent ways (columns 5–10) and the runtime of the exact (column 11) and our heuristic algorithm (column 12). The datasets are sorted according to their absolute approximation error (column 5 of Tab. 1), i.e., the sum of all diferences between the approximation and the exact Tukey depth over all nodes and all graphs in the dataset.

Regarding the absolute error (column 5), our approximation results are equal to the exact Tukey depths for 5 out of the 19 datasets, while their computation was faster by a factor of up to 100 (see PTC_MM). Our algorithm has the largest absolute error of 4155 on the COIL-DEL graphs, by noting that this dataset consists of 3900 graphs. Hence, the average error per graph is only slightly above one. Additionally, we look at the relative errors (column 6), i.e., the absolute error divided by the sum of all depths. We use this measure to validate that our algorithm performs very well, by noting that the relative errors are below 4 · 10− 3 for all graph datasets. The per node error (column 7) is the average error our algorithm makes per node, while the per graph error (column 8) is the error it has on average per graph. Regarding the per node error, the worst-case is for the COIL-DEL dataset (last row) with an average error of 0.05. For the per graph error, the worst result was obtained for the OHSU dataset, where we overestimate the sum of all node depths by 1.65 per graph on average. This shows that our approximation algorithm performs very well, especially, if considering the average results over the datasets.

Finally, we studied also the worst-case approximations for nodes and graphs. In particular, the columns Max. Node Error resp. Max. Graph Error denote the maximum error of the algorithm on single nodes resp. on single graphs. The results show that there is a very low error of at most 3 per node for 13 out of the 19 datasets. For three graph datasets, the maximum error per node is at most 7 and we have a maximum error between 11 and 19 in three cases. Regarding the maximum error per graph, a similar behavior can be observed by noting that except for OHSU and Peking_1, the maximum node errors and maximum graph errors are close to each other. This implies that there are only a few nodes with a high approximation error. It is an interesting problem to find the structural properties of such nodes and graphs that are responsible for the high approximation errors. The last two columns show the runtimes of the two algorithms. Our algorithm (last column) is faster than the exact one by at least one order of magnitude. In summary, the evaluation of Alg. 1 clearly demonstrates that our heuristic performs well in 2See https://github.com/fseifarth/AppOfTukeyDepth for the code.

. - ) rxo un (s p R e

m p A i t ) t - (s c n xa u e E R im

t . h r xa rap rro M

G E .xa eod rrro M N E

h E r o r p r e a r p r

G r e rro rep od E N rro l-a )e rE (re itv rE (ab ltu . s g e v g A d

E .vg sed A o

N r e b f p m o a u r N

G a t a D 0 9 0 0 3 1 4 1 9 5 7 3 3 8 3 2 9 4 5

6 5 2 7 0 0 0 0 0 1 1 1 1 4 3 8 0 2 8 3 0 5 7

1 1 1 1 2 1 0 0 0 0 0 1 1 1 1 2 1 6 3 2 7 2 7 9 1

1 1 1 0 e 0 0 -

e 0 0 0 0 0 3 3 3 3 1 2 3 1 1 1 0 1 0 0 -0 -0 0 -0 0 0 0 0 0 0 e e e e e + e e + + e e 0 0 0 0 0 4 4 4 4 3 4 4 2 2 3 2 2 2 2 0 0 -0 -0 0 -0 -0 0 -0 0 0 -0 0 0 e 0 e e

e 0 0 0 0 0 5 5 5 5 3 4 5 3 3 3 4 3 3 3 0 0 0 -0 0 -0 -0 0 0 e .3 . 8 4 .7 .

3 m M r o _M 2 i

X n F f e R _ _

C C R M G F F

h od it

N n T t . s t e s a t ) a e d i l t

u e x r

a e o e M, th n d h w

ap oh v r

g s a r s e n

m a l in re p ig ( r r r o lu o r o r e c o r e r h E e o t , t g w

a t om sea re ts r t f v a ed ad ea le

h h v e o h t

t m

T s . e n te h r o o p re r a r s r o edn rag

a h

e h p ll p n r ra ra o a

r g e G o d vo re r

r e t e p e c n seh rro um e t n r

E im o t s o , c i n ed x

a D ed o m .] ) n e o r e

o t 2 te r [ lu m s o ab rr r f ( d r

o e r t c r e E l e .

h e t p s

o e en e d ag r r o e rr v E s s i e z i s g a n e h w th ap

r lo s t o t .G .s l e o x d n a n

o h e t d i d

e e d s fo t n a on od a in t e N e

M c

e a 1 d le ph lit b a a r d r

e y p r o a r d m o h n it

r a o r g approximating the graph Tukey depth. It is faster (sometimes more than 100 times) than the exact algorithm, even on small graph datasets. Regarding larger graphs, this gap in runtime will increase because of the exponential runtime of the exact algorithm. Additionally, the very small relative errors (at most 4 · 10− 3), the average errors (at most 1.65 per graph), and also the worst case errors show that the algorithm can be used efectively for further applications based on the Tukey depth (see Sec. 3).

5. Concluding Remarks

Graph Tukey depth appears an interesting and promising new concept for mining and learning with graphs. The study of the relationship of graph Tukey depth to other node centrality measures is an interesting question for further research (cf. Fig. 1). For example, while the centroid(s) in trees [27, 28] are exactly the nodes with the highest Tukey depth, this is not necessarily the case for more general graphs beyond trees.

Another important issue is to understand the semantics of graph Tukey depth better. For example, what are the properties of the nodes with the highest depth (cf. the def. of Tukeymedian in R)? We have empirically demonstrated that graph Tukey depth can closely be approximated in small graphs. It is a question of whether this result holds for (very) large graphs as well. To answer this question, the scalability of our approximation algorithm should be improved on the one hand. On the other hand, one needs (possibly tight) theoretical upper bounds on graph Tukey depths. Another interesting question is to identify graph classes for which our heuristic always results in the exact graph Tukey depth. While this is the case for trees, it is unclear whether it holds for outerplanar graphs as well. We believe that this question can be answered afirmatively by using techniques from [ 16]. We show experimentally that there is a connection between geodesic core-periphery decompositions and the deterministic definition of -geodesic cores. This suggests that using our core approximation algorithm [16], we can closely approximate the set of nodes with the highest Tukey depth.

Acknowledgments This research has been funded by the Federal Ministry of Education and Research of Germany and the state of North-Rhine Westphalia as part of the Lamarr-Institute for Machine Learning and Artificial Intelligence, LAMARR22B. The authors gratefully acknowledge this support. [5] J. W. Tukey, Mathematics and the picturing of data, Proceedings of the International

Congress of Mathematicians, Vancouver, 1975 2 (1975) 523–531. [6] F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review 65 (1958) 386–408. [7] B. E. Boser, I. M. Guyon, V. N. Vapnik, A training algorithm for optimal margin classifiers,

COLT, ACM, 1992, pp. 144–152. [8] X. Dai, S. Lopez-Pintado, for the Alzheimer’s Disease Neuroimaging Initiative, Tukey’s depth for object data, Journal of the American Statistical Association (2022) 1–13. [9] P. Mozharovskyi, Contributions to depth-based classification and computation of the

Tukey depth, Ph.D. thesis, University of Cologne, 2015. [10] R. Gilad-Bachrach, A. Navot, N. Tishby, Bayes and tukey meet at the center point, in:

Learning Theory, Springer Berlin Heidelberg, Berlin, Heidelberg, 2004, pp. 549–563. [11] C. Becker, U. Gather, The masking breakdown point of multivariate outlier identification rules, Journal of the American Statistical Association 94 (1999) 947–955. [12] D. Bremner, D. Chen, J. Iacono, S. Langerman, P. Morin, Output-sensitive algorithms for tukey depth and related problems, Statistics and Computing 18 (2008) 259–266. [13] D. Johnson, F. Preparata, The densest hemisphere problem, Theor. Comput. Sci. 6 (1978) 93–107. [14] F. Seifarth, T. Horváth, S. Wrobel, Maximal closed set and half-space separations in finite closure systems, in: ECML PKDD 2019, volume 11906 of LNCS, Springer, 2019, pp. 21–37. [15] M. Tilen, L. Šubelj, Convexity in complex networks, Network Science 6 (2018) 176–203. [16] F. Seifarth, T. Horváth, S. Wrobel, A fast heuristic for computing geodesic cores in large networks, 2022. URL: https://arxiv.org/abs/2206.07350. [17] L. Šubelj, Convex skeletons of complex networks, J. R. Soc. Interface 15 (2018). [18] L. Šubelj, D. Fiala, T. Ciglaric, L. Kronegger, Convexity in scientific collaboration networks,

Journal of Informetrics 13 (2019) 10–31. [19] J. Leskovec, A. Krevl, SNAP Datasets: Stanford large network dataset collection, http: //snap.stanford.edu/data, 2014. [20] P. H. M. de Araújo, M. B. Campêlo, R. C. Corrêa, M. Labbé, The geodesic classification problem on graphs, volume 346 of Elec. Notes Theor. Comput. Sci., Elsevier, 2019, pp. 65–76. [21] F. Seifarth, T. Horváth, S. Wrobel, Maximum margin separations in finite closure systems, in: ECML PKDD 2020, volume 12457 of LNCS, Springer, 2020, pp. 3–18. [22] M. Thiessen, T. Gärtner, Active learning of convex halfspaces on graphs, in: Advances in

Neural Information Processing Systems, 2021. [23] W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33 (1977) 452–473. [24] D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Association for Computing Machinery, 1993. [25] D. Lusseau, The emergent properties of a dolphin social network, Proceedings of the

Royal Society of London. Series B: Biological Sciences 270 (2003). [26] I. M. Pelayo, Geodesic Convexity in Graphs, Springer New York, 2013. [27] W. Piotrowski, A generalization of branch weight centroids, Appl. Math. 19 (1987) 541–545. [28] W. Piotrowski, M. M. Syslo, Some properties of graph centroids, Ann. Oper. Res. 33 (1991) 227–236.

[1]

Newman , Networks: An Introduction , Oxford University Press, Inc., USA, 2010 .

[2]

Morris ,

N. M.

Kriege ,

Bause ,

Kersting ,

Mutzel ,

Neumann , Tudataset: A collection of benchmark datasets for learning with graphs , in: ICML, GRL Workshop , 2020 .

[3]

J. O.

Cerdeira ,

P. C.

Silva , A centrality notion for graphs based on tukey depth , Appl. Math. Comput . 409 ( 2021 ) 126409 .

[4]

D. L.

Donoho ,

Gasko , Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness , The Annals of Statistics 20 ( 1992 ) 1803 - 1827 .