<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Simple Heuristic for the Graph Tukey Depth Problem with Potential Applications to Graph Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Florian Seifarth</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tamás Horváth</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Wrobel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Science, University of Bonn</institution>
          ,
          <addr-line>Bonn</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fraunhofer Center for Machine Learning</institution>
          ,
          <addr-line>Sankt Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Fraunhofer IAIS</institution>
          ,
          <addr-line>Schloss Birlinghoven, Sankt Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We study a recently introduced adaptation of Tukey depth to graphs and discuss its algorithmic properties and potential applications to mining and learning with graphs. In particular, since it is NP-hard to compute the Tukey depth of a node, as a first contribution we provide a simple heuristic based on maximal closed set separation in graphs and show empirically on diferent graph datasets that its approximation error is small. Our second contribution is concerned with geodesic core-periphery decompositions of graphs. We show empirically that the geodesic core of a graph consists of those nodes that have a high Tukey depth. This information allows for a parameterized deterministic definition of the geodesic core of a graph.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;graph Tukey depth</kwd>
        <kwd>geodesic closure</kwd>
        <kwd>closed set separations</kwd>
        <kwd>geodesic core-periphery decomposition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>1
I
C
N
9
_
C
R
S
M
L
E
D
L
I
O
C</p>
      <sec id="sec-1-1">
        <title>Degree Cen.</title>
      </sec>
      <sec id="sec-1-2">
        <title>Closeness Cen.</title>
      </sec>
      <sec id="sec-1-3">
        <title>Betweenness Cen. Tukey Depth small large</title>
        <p>machine learning approaches that are based on hyper-plane separations [6, 7]. For R and in
other more general metric spaces [8], Tukey depth has been studied in the context of machine
learning, in particular, for object classification [ 9]. In case of learning linear classifiers, it is
shown in [10] that the Tukey median, i.e., the points with the highest Tukey depth, are related
to the Bayes point.</p>
        <p>
          An important advantage of Tukey depth over other centrality measures is that the exact
geometric position of the points in R is not relevant for their depth. Thus, the Tukey depth is
relatively stable regarding outliers and is therefore used for outlier detection [11, 12]. Moreover,
as Tukey depth relies on separations, it is possible to adapt it to other domains enabling (abstract)
hyper-plane or half-space separation. In particular, this property of the original Tukey depth is
utilized in its adaptation to geodesic closure systems over graphs [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          Similarly to the fact that the computation of the Tukey depth in R is NP-hard [13], it is also
NP-hard to compute the graph Tukey depth of a node [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Motivated by this negative result,
one of our main contributions is a heuristic algorithm for approximating graph Tukey depth. It
runs in time polynomial in the size of the input graph and approximates the Tukey depth of
a node with one-sided error by overestimating it. Our experimental results with small graphs
clearly demonstrate that the approximation is close to the exact Tukey depth, by noting that for
larger graphs we were not able to evaluate the approximation performance of our algorithm, as
it was not possible to calculate the exact Tukey depth in practically feasible time.
        </p>
        <p>Our heuristic is based on our greedy algorithm designed in [14] for solving the more general
maximal closed set separation (MCSS) problem. In the particular case of graphs, the problem is
as follows: Given two subsets ,  of the node set, find, if they exist, two (inclusion) maximal
disjoint geodesically closed node sets that separate  and  from each other. As maximal
disjoint closed sets provide a separation of sets, it is, therefore, natural to ask the following
question: Is there a connection between graph Tukey depth and node separation with geodesically
closed node sets? We give an afirmative answer to this question by showing experimentally
on small graph datasets that the solutions of the MCSS problem provide a good approximation
quality of the exact Tukey depth. This mainly relies on the following fact: For any closed set
containing at least one node of high Tukey depth, there exists no large disjoint closed set.</p>
        <p>It follows from the definition of graph Tukey depth that it is related to other concepts based
on geodesic convexity. One of these notions is the recent probabilistic definition of geodesic
core-periphery decomposition of graphs. It was introduced in [15] and studied in [15, 16, 17, 18].
Hence, our second question is concerned with the following problem: Is there a connection
between graph Tukey depth and geodesic core-periphery decompositions? The geodesic
coreperiphery decomposition breaks up some types of graphs (including interaction graphs, such as
social networks) into a dense core and a sparse “surrounding” periphery [15] (see Fig. 3 for a
visual example). While some graphs (e.g., Erdős-Rényi, Barabási-Albert, and Watts-Strogatz
random graphs) seem to have no periphery, others (e.g., trees and fully connected graphs) seem
to have no core. Since this behavior is not well-understood up to now, we try to understand it
by studying the nodes’ Tukey depth. It seems that the geodesic core of a graph consists of those
nodes that are of high Tukey depth (see Fig. 4 for some examples). This observation allows for
a parameterized deterministic definition of the cores. That is, the core of a graph can be defined
by the set of those nodes that have a Tukey depth greater than a user-specified threshold. Our
empirical results clearly demonstrate that using the right threshold, the probabilistic definition
of cores in [15] coincides with our deterministic one.</p>
        <p>The rest of the paper is organized as follows. In Sec. 2 we first collect the necessary notions
and notation. Sec. 3 contains some examples showing that graph Tukey depth is strongly related
to such mining and learning algorithms on graphs that rely on graph geodesic convexity. In
Sec. 4 we present our heuristic for approximating graph Tukey depth and evaluate it empirically
on small graph datasets. Finally, in Sec. 5 we mention some open questions for future research.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>In this section, we collect the necessary notions and fix the notation. For a graph  = (, ),
 () and () denote the set  of nodes and the set  of edges. Furthermore  and  denotes
| ()| and |()|, respectively. By graphs, we always mean undirected graphs without loops
and multi-edges. For any ,  ∈  (), the (geodesic) interval [, ] is the set of all nodes on all
shortest paths between  and  (see Fig. 2a for an example). A set of nodes  ⊆  () is called
(geodesically) closed if for all ,  ∈  (), ,  ∈  implies [, ] ⊆ . The closure  () of a</p>
      <p>(a)</p>
      <p>
        (b)
set  ⊆  () is the smallest closed set containing  (see Fig. 2b for an example of  ({, })).
The Graph Tukey Depth Problem For a graph , the Tukey depth of a node  ∈  () is
defined as follows [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: Let  ⊆  () be a closed set of maximum cardinality such that  ∈/ .
Then the Tukey depth of  is defined by td() := | ()| − | |. That is,  has a high Tukey
depth if all closed sets not containing  are of small cardinality. We are interested in the Graph
Tukey depths problem defined as follows: Given a graph  compute td() for all  ∈  ().
Geodesic Cores Geodesic cores have been defined in [ 15] in a probabilistic way. Informally,
the geodesic core of a graph consists of those nodes which are contained in most geodesic closed
sets that are generated by a small number of random nodes. Of course, the core defined in this
way can be empty. However, it turns out that this is not the case for interaction graphs (e.g.
social networks) [15]. We define the geodesic core of a graph , denoted by  as follows. Let
1, 2, . . . be a sequence of sets, where each set consists of  &gt; 0 nodes selected independently
and uniformly at random from  (). Then  = ⋂︀
=1  ( ) is the core of , where  is
the smallest integer satisfying ⋂︀ =1  ( ). Obviously, this definition is not
=1  ( ) = ⋂︀+1
deterministic since diferent choices of  and  can lead to diferent cores. Nevertheless,
the experiments in [16] with large real-world networks show that for  ≈ 10, the core (if it
exists) does not depend on the particular choice of the generator elements. The core-periphery
decomposition of a graph is composed of the subgraph induced by the core nodes and that by
the remaining nodes, called periphery. In Fig. 3 we give a visual example of the core-periphery
decomposition of the CA-GrQc network [19]1.
      </p>
      <p>MCSS problem for Graphs We will use the maximal closed set separation (MCSS) problem
restricted to geodesic closures over graphs (see [14] for the generic definition): Given a graph
 = (, ) and ,  ⊆  () with disjoint geodesic closures, find two geodesic closed sets
′, ′ ⊆  () with ′ ∩ ′ = ∅ and  ⊆ ′,  ⊆ ′ that are maximal, i.e., there exist no
proper supersets of ′ resp. ′ that fulfill the above properties.</p>
      <p>1This network is built by the co-authorships in the general relativity and quantum cosmology community.
(a) Entire Network
(b) (Geodesic) Core
(c) Periphery
3. Potential Applications to Mining and Learning with Graphs
Before we turn to the algorithmic aspects of the graph Tukey depth problem, in this section we
ifrst present some of its potential applications to mining and learning with graphs. In particular,
this section deals with some connections of graph Tukey depth to node separations and to geodesic
core-periphery decompositions. We state three important properties of Tukey depth. In particular,
Proposition 1 clarifies the role of Tukey depth in the context of geodesic closed sets.
Proposition 1. Let  be a graph,  ∈  () with td() =  − , and  ⊆  () a geodesically
closed node set with || &gt; . Then  ∈ .</p>
      <p>
        Proposition 2 ([
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). Let  be a graph,  ⊆  (), and  be the closure of . Then the graph
Tukey depth is a quasi-concave function, i.e., for all  ∈  we have () ≥ min{() :  ∈ } .
Proposition 3 ([
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). Let  be a graph,  ∈ N, and  = { ∈  () : td() ≥ }. Then  is
geodesically closed.
      </p>
      <p>To underline the importance of these three statements, we give two examples that show how
they (can) influence machine learning and data mining methods based on geodesic closures.
Example 1: Node Classification and Active Learning In [14, 20, 21, 22], disjoint
halfspaces and closed sets are used for binary classification in closure systems, for node classification,
and active learning in graphs using geodesic convexity. Given the Tukey depth td() of a node
, Proposition 1 immediately implies that a separating half-space or closed set not containing 
cannot have a cardinality greater than  − td(). Thus, for nodes of high Tukey depth, there is
no large geodesic closed set not containing them. Hence, Proposition 1 implies a nice theoretical
connection between graph Tukey depth and the maximum size of separating half-spaces and
closed sets. Using approximate graph Tukey depths, the predictive performance of all the above
methods can possibly be improved.</p>
      <p>h
t
p
e
D
y
e
k
u</p>
      <p>T
.
eodG reoC
small depths
large depths
Example 2: Geodesic core-periphery decomposition The geodesic core-periphery
decomposition of graphs was analyzed in [15, 16, 17]. In particular, it was found in [15, 17] that many
social networks consist of a dense geodesic core “surrounded” by a sparse periphery (see Fig. 3
for an example). While some graphs, especially tree-like graphs, seem to have no core, others,
such as graphs sampled from random graph models (e.g., Erdős-Rényi, Barabási-Albert and
Watts-Strogatz) seem to have no periphery. Moreover, the closure of a small number of randomly
chosen graph nodes (≈ 10) always contains the geodesic core (if it exists). Furthermore, if the
nodes are sampled from the geodesic core only, then the closure of the nodes is the geodesic
core itself. If we compute the closure of, say, 10 randomly chosen nodes from the entire network
(Fig. 3a), then the closure always contains the core (orange nodes in Fig. 3b). If all random nodes
belong to the core (orange nodes in Fig. 3a), then their closure is the core itself. The above
statements explain this behavior. Using that the core is always contained in the closure of a
small number of randomly chosen nodes, from Proposition 2 it follows that the nodes in the
core are those with the highest Tukey depths. Moreover, the quasi-concave property of graph
Tukey depth implies that if the core is generated by a few nodes from the core, then the core
nodes must have a very close Tukey depth. Finally, using Proposition 3, we have that the set
of nodes in a graph with a Tukey depth above some threshold is always geodesically closed;
cores arise as a special case of this property. These three properties motivate the following
deterministic definition of geodesic cores:
Definition 1. The -geodesic core of a graph  is defined by</p>
      <p>:= { ∈  () : td() ≥ } .</p>
      <p>
        To empirically confirm our claim that the core contains the nodes with the highest Tukey
depths, consider the three graphs in Fig. 4. We computed the exact Tukey depths (top) and
their geodesic cores (bottom). For the Karate Club network (left) considered also in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the core
exactly matches the set of nodes of the highest Tukey depth. Furthermore, there is not much
lfuctuation in the depths of the core nodes. In fact, all nodes of Tukey depth of at most 3 belong
to the periphery and all nodes of Tukey depth 19 or 21 to the core, by noting that there are
no nodes of Tukey depth between 4 and 18. In case of the Les Miserables character network
(middle), there is only a single node with a very high Tukey depth of 57, surrounded by nodes of
depth less than 35. In this case, the core algorithm returns only the node with the highest Tukey
depth, showing that the graph Tukey depth can possibly be used to improve core-periphery
decompositions. This is also the case for the Dolphin community graph (right), where the core
consists of nodes with Tukey depth greater than 2, while all nodes in the periphery have a
Tukey depth of at most 2.
      </p>
    </sec>
    <sec id="sec-3">
      <title>4. Approximating the Tukey Depth</title>
      <p>Motivated by the negative complexity result concerning the calculation of Tukey depth, in
Sec. 4.1 below we first propose a heuristic based on the algorithm in [14] that solves the MCSS
problem. It approximates the nodes’ Tukey depths with one-sided error. We show experimentally
on diferent types of small graphs that the results obtained by our heuristic are fairly close to the
exact ones. Furthermore, even on small graphs, our algorithm is up to 200 times faster than the
exact one (Sec. 4.2). It is important to emphasize that we had to resort to such graphs, as it was
not possible to calculate the exact Tukey depths for larger graphs in a practically feasible time.</p>
      <sec id="sec-3-1">
        <title>4.1. The Heuristic</title>
        <p>
          Recall that the exact Tukey depth of a node  is defined by td() := | ()| − | |, where || is
the maximum cardinality of a closed set  not containing . It can be computed exactly using
an integer linear program (see [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] for the details). The computationally hard part of the problem
is to find a closed set of maximum size. Our heuristic addresses this problem by considering an
inclusion maximal closed set only, instead of a maximum cardinality closed set. This relaxation,
which distorts of course the exact value of Tukey depth, allows us to apply the eficient greedy
algorithm proposed in [14] for solving the MCSS problem. In what follows, for any  ∈  (),
t̂d︀() denotes the approximation of td() obtained with our heuristic.
        </p>
        <p>Given a graph , the rough idea to approximate the Tukey depth of a node  ∈  () is
to find an inclusion maximal geodesically closed set  ⊆  () with  ∈/ . Such a set 
can be found by applying the MCSS algorithm [14] with input {} and {′}. Then the output
of the algorithm is a solution of the MCSS problem, i.e., it consists of two closed node sets
, ′ ⊆  () with  ∈  and ′ ∈ ′ that are disjoint and inclusion maximal. That is,
there exist no proper closed supersets of , ′ with the same properties. The Tukey depth
can then be approximated using the cardinalities of  resp. ′ . For a fixed node , the result
depends on the particular choice of ′. To improve the approximation quality, we therefore call
the MCSS algorithm for each node  several times, with diferent nodes ′ ̸= .</p>
        <p>The pseudo-code of the above heuristic is given in Alg. 1. In Line 1 we initialize the Tukey
depth of all nodes in  by setting them to the maximum possible value, i.e., to | ()|. We repeat
the procedure described above for all nodes  ∈  () and all their neighbors ′ ∈ Γ() (see the
Algorithm 1: Approximation of Graph Tukey Depth</p>
        <p>Input : graph</p>
        <p>Output : approximation t̂d︀() of td() for all  ∈  ()
1 t̂d︀() − ← |  ()| for all  ∈  ();
2 forall  ∈  () do
3 forall ′ ∈ Γ() do
4 ′ ,  =  ({′}, {});
5 forall  ∈  () do
6 if  ̸∈ ′ then
7 t̂d︀() = min{t̂d︀(), | ()| − | ′ |};
8 if  ̸∈  then
9 t̂d︀() = min{t̂d︀(), | ()| − | |};
10 return t̂d︀() for all  ∈  ()
for-loops in Line 2 and 3). In this way we solve the MCSS problem for all input sets {}, {′},
i.e., separate  from all of its neighbors ′ by maximal disjoint closed sets , ′ (see Line 4).
Note that the Tukey depth of a node  is based on a closed set of maximum cardinality not
containing . Thus, if  does not lie in the set  (resp. ′ ), then the cardinality of  (resp.
′ ) is smaller than or equal to a closed set of maximum cardinality not containing . Hence,
we can update the current Tukey depth approximation of all nodes  ∈  () as follows: Take
the minimum over the old and the new approximation which is the cardinality of  () ∖  if
 ∈/  or that of  () ∖ ′ if  ∈/ ′ (see Line 7 and Line 9).</p>
        <p>By construction, Alg. 1 finds only maximal and not maximum closed sets, resulting in a
one-sided error in the estimate of Tukey depths. This is formulated in the proposition below.
Proposition 4. Alg. 1 overestimates the Tukey depth, i.e., for the output t̂d︀() returned by Alg. 1
we have t̂d︀() ≥ td(), for all  ∈  ().</p>
        <p>Regarding the runtime of Alg. 1, note that the inner part of the loop in Lines 3-9 is executed
 () times because we iterate over all neighbors (i.e., all edges are considered twice). The
runtime of the inner loop (Lines 3–9) is dominated by the MCSS algorithm called in Line 4,
which calls the closure operator at most  () times [14]. Since the geodesic closure can be
computed in time () [26], we have the following result for the total runtime of Alg. 1:
Proposition 5. Alg. 1 returns an upper bound of the Tukey depth for all nodes of  in (22)
time.</p>
        <p>The runtime of the approximation algorithm can be improved by considering for each node 
a fixed number of distinct nodes ′, or by considering a fixed subset  ⊆  (), instead of the
whole node set  () in the outer loop (see Lines 2–9). It is left to further research to analyze
how these changes afect the quality of the approximation performance.</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Experimental Evaluation</title>
        <p>
          In this section we empirically evaluate the approximation quality and runtime of Alg. 1 on
datasets containing small graphs2. Regarding the approximation quality, we compare the results
obtained by our heuristic algorithm to the exact Tukey depths computed with the algorithm
in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. For the evaluation, we consider 19 graph datasets of diferent types (small molecules,
small graphs from bioinformatics and computer vision, and small social networks) from [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
(see columns 2–4 of Tab. 1 for the number of graphs and their average number of nodes and
edges). The average size of the graphs ranges from 14 (PTC_MM) up to 82 (OHSU ); their average
edge numbers from 14 to 200. The reason for considering small graphs only is that the exact
algorithm [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] was unable to calculate the Tukey depth for larger graphs within one day (see the
last two columns of Tab. 1). For practical reasons, we removed all disconnected graphs from the
original datasets, by noting that our heuristic works for disconnected graphs as well.
        </p>
        <p>The results are presented in Tab. 1. It contains the approximation qualities measured in
diferent ways (columns 5–10) and the runtime of the exact (column 11) and our heuristic
algorithm (column 12). The datasets are sorted according to their absolute approximation error
(column 5 of Tab. 1), i.e., the sum of all diferences between the approximation and the exact
Tukey depth over all nodes and all graphs in the dataset.</p>
        <p>Regarding the absolute error (column 5), our approximation results are equal to the exact
Tukey depths for 5 out of the 19 datasets, while their computation was faster by a factor of up
to 100 (see PTC_MM). Our algorithm has the largest absolute error of 4155 on the COIL-DEL
graphs, by noting that this dataset consists of 3900 graphs. Hence, the average error per graph is
only slightly above one. Additionally, we look at the relative errors (column 6), i.e., the absolute
error divided by the sum of all depths. We use this measure to validate that our algorithm
performs very well, by noting that the relative errors are below 4 · 10− 3 for all graph datasets.
The per node error (column 7) is the average error our algorithm makes per node, while the per
graph error (column 8) is the error it has on average per graph. Regarding the per node error,
the worst-case is for the COIL-DEL dataset (last row) with an average error of 0.05. For the
per graph error, the worst result was obtained for the OHSU dataset, where we overestimate
the sum of all node depths by 1.65 per graph on average. This shows that our approximation
algorithm performs very well, especially, if considering the average results over the datasets.</p>
        <p>Finally, we studied also the worst-case approximations for nodes and graphs. In particular, the
columns Max. Node Error resp. Max. Graph Error denote the maximum error of the algorithm
on single nodes resp. on single graphs. The results show that there is a very low error of at most
3 per node for 13 out of the 19 datasets. For three graph datasets, the maximum error per node
is at most 7 and we have a maximum error between 11 and 19 in three cases. Regarding the
maximum error per graph, a similar behavior can be observed by noting that except for OHSU
and Peking_1, the maximum node errors and maximum graph errors are close to each other. This
implies that there are only a few nodes with a high approximation error. It is an interesting
problem to find the structural properties of such nodes and graphs that are responsible for the
high approximation errors. The last two columns show the runtimes of the two algorithms.
Our algorithm (last column) is faster than the exact one by at least one order of magnitude. In
summary, the evaluation of Alg. 1 clearly demonstrates that our heuristic performs well in
2See https://github.com/fseifarth/AppOfTukeyDepth for the code.</p>
        <p>. - )
rxo un (s
p R e</p>
        <p>m
p
A
i
t
)
t - (s
c n
xa u e
E R im</p>
        <p>t
. h r
xa rap rro
M</p>
        <p>G E
.xa eod rrro
M N E</p>
        <p>h
E
r
o r p
r e a
r p r</p>
        <p>G
r e
rro rep od
E N
rro l-a )e
rE (re itv
rE (ab ltu
. s
g e
v g
A d</p>
        <p>E
.vg sed
A o</p>
        <p>N
r
e
b f p
m o a
u r
N</p>
        <p>G
a
t
a
D
0 9 0 0 3 1 4 1 9 5 7 3 3 8 3 2 9 4 5</p>
        <p>6
5 2 7
0 0 0 0 0 1 1 1 1 4 3 8 0 2 8 3 0 5 7</p>
        <p>1 1 1 1 2 1
0 0 0 0 0 1 1 1 1 2 1 6 3 2 7 2 7 9 1</p>
        <p>1 1 1
0
e
0 0
-</p>
        <p>e
0 0 0 0 0 3 3 3 3 1 2 3 1 1 1 0 1 0 0
-0 -0 0 -0 0 0 0 0 0 0
e
e
e
e
e
+
e e
+ +
e e
0 0 0 0 0 4 4 4 4 3 4 4 2 2 3 2 2 2 2
0 0 -0 -0 0 -0 -0 0 -0 0 0 -0 0 0
e
0
e
e</p>
        <p>e</p>
        <p>e
0 0 0 0 0 5 5 5 5 3 4 5 3 3 3 4 3 3 3
0 0 0 -0 0 -0 -0 0 0
e
.3 .
8 4
.7 .</p>
        <p>3
m
M
r
o
_M 2 i</p>
        <p>X n F
f
e R _ _</p>
        <p>C C
R M G
F F</p>
        <p>R</p>
        <p>h od it</p>
        <p>m</p>
        <p>N n
T t
.
s
t
e
s
a
t )
a e
d i
l t</p>
        <p>u
e x r</p>
        <p>a e
o
e M, th
n
d h w</p>
        <p>ap oh
v r</p>
        <p>g s
a r s
e n</p>
        <p>m
a l
in re p
ig ( r
r
r o lu
o r o
r
e c
o r
e r
h E e o
t ,
t g w</p>
        <p>a t
om sea re ts
r t
f v a
ed ad ea le</p>
        <p>h h
v e
o h t</p>
        <p>t
m</p>
        <p>T
s .
e n te h
r o o p
re r
a r
s r
o edn rag</p>
        <p>a
h</p>
        <p>e h
p ll p n
r ra ra o
a</p>
        <p>r
g e G o
d vo re r</p>
        <p>r
e
t e p e
c
n seh rro um
e t
n
r</p>
        <p>E im
o t
s o ,
c
i
n ed x</p>
        <p>a
D ed o m
.] ) n e
o r e</p>
        <p>o t
2 te r
[ lu
m s
o ab rr
r
f (
d r</p>
        <p>o
e r
t
c r
e E
l
e .</p>
        <p>h
e t
p s</p>
        <p>o
e en
e d
ag r
r o
e rr
v E
s
s i
e
z
i
s
g a
n e h
w th ap</p>
        <p>r
lo s
t
o t .G .s
l e
o x d
n a n</p>
        <p>o
h e
t d
i
d</p>
        <p>e e d s
fo t n
a on od a in
t e N e</p>
        <p>M c</p>
        <p>e
a
1 d
le ph lit
b a
a r
d r</p>
        <p>e
y p
r
o
a r
d m
o h
n it</p>
        <p>r
a o
r g
approximating the graph Tukey depth. It is faster (sometimes more than 100 times) than the
exact algorithm, even on small graph datasets. Regarding larger graphs, this gap in runtime
will increase because of the exponential runtime of the exact algorithm. Additionally, the very
small relative errors (at most 4 · 10− 3), the average errors (at most 1.65 per graph), and also the
worst case errors show that the algorithm can be used efectively for further applications based
on the Tukey depth (see Sec. 3).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Concluding Remarks</title>
      <p>Graph Tukey depth appears an interesting and promising new concept for mining and learning
with graphs. The study of the relationship of graph Tukey depth to other node centrality
measures is an interesting question for further research (cf. Fig. 1). For example, while the
centroid(s) in trees [27, 28] are exactly the nodes with the highest Tukey depth, this is not
necessarily the case for more general graphs beyond trees.</p>
      <p>Another important issue is to understand the semantics of graph Tukey depth better. For
example, what are the properties of the nodes with the highest depth (cf. the def. of
Tukeymedian in R)? We have empirically demonstrated that graph Tukey depth can closely be
approximated in small graphs. It is a question of whether this result holds for (very) large
graphs as well. To answer this question, the scalability of our approximation algorithm should
be improved on the one hand. On the other hand, one needs (possibly tight) theoretical upper
bounds on graph Tukey depths. Another interesting question is to identify graph classes for
which our heuristic always results in the exact graph Tukey depth. While this is the case for
trees, it is unclear whether it holds for outerplanar graphs as well. We believe that this question
can be answered afirmatively by using techniques from [ 16]. We show experimentally that
there is a connection between geodesic core-periphery decompositions and the deterministic
definition of -geodesic cores. This suggests that using our core approximation algorithm [16],
we can closely approximate the set of nodes with the highest Tukey depth.</p>
      <p>Acknowledgments
This research has been funded by the Federal Ministry of Education and Research of Germany
and the state of North-Rhine Westphalia as part of the Lamarr-Institute for Machine Learning
and Artificial Intelligence, LAMARR22B. The authors gratefully acknowledge this support.
[5] J. W. Tukey, Mathematics and the picturing of data, Proceedings of the International</p>
      <p>Congress of Mathematicians, Vancouver, 1975 2 (1975) 523–531.
[6] F. Rosenblatt, The perceptron: A probabilistic model for information storage and
organization in the brain, Psychological Review 65 (1958) 386–408.
[7] B. E. Boser, I. M. Guyon, V. N. Vapnik, A training algorithm for optimal margin classifiers,</p>
      <p>COLT, ACM, 1992, pp. 144–152.
[8] X. Dai, S. Lopez-Pintado, for the Alzheimer’s Disease Neuroimaging Initiative, Tukey’s
depth for object data, Journal of the American Statistical Association (2022) 1–13.
[9] P. Mozharovskyi, Contributions to depth-based classification and computation of the</p>
      <p>Tukey depth, Ph.D. thesis, University of Cologne, 2015.
[10] R. Gilad-Bachrach, A. Navot, N. Tishby, Bayes and tukey meet at the center point, in:</p>
      <p>Learning Theory, Springer Berlin Heidelberg, Berlin, Heidelberg, 2004, pp. 549–563.
[11] C. Becker, U. Gather, The masking breakdown point of multivariate outlier identification
rules, Journal of the American Statistical Association 94 (1999) 947–955.
[12] D. Bremner, D. Chen, J. Iacono, S. Langerman, P. Morin, Output-sensitive algorithms for
tukey depth and related problems, Statistics and Computing 18 (2008) 259–266.
[13] D. Johnson, F. Preparata, The densest hemisphere problem, Theor. Comput. Sci. 6 (1978)
93–107.
[14] F. Seifarth, T. Horváth, S. Wrobel, Maximal closed set and half-space separations in finite
closure systems, in: ECML PKDD 2019, volume 11906 of LNCS, Springer, 2019, pp. 21–37.
[15] M. Tilen, L. Šubelj, Convexity in complex networks, Network Science 6 (2018) 176–203.
[16] F. Seifarth, T. Horváth, S. Wrobel, A fast heuristic for computing geodesic cores in large
networks, 2022. URL: https://arxiv.org/abs/2206.07350.
[17] L. Šubelj, Convex skeletons of complex networks, J. R. Soc. Interface 15 (2018).
[18] L. Šubelj, D. Fiala, T. Ciglaric, L. Kronegger, Convexity in scientific collaboration networks,</p>
      <p>Journal of Informetrics 13 (2019) 10–31.
[19] J. Leskovec, A. Krevl, SNAP Datasets: Stanford large network dataset collection, http:
//snap.stanford.edu/data, 2014.
[20] P. H. M. de Araújo, M. B. Campêlo, R. C. Corrêa, M. Labbé, The geodesic classification
problem on graphs, volume 346 of Elec. Notes Theor. Comput. Sci., Elsevier, 2019, pp. 65–76.
[21] F. Seifarth, T. Horváth, S. Wrobel, Maximum margin separations in finite closure systems,
in: ECML PKDD 2020, volume 12457 of LNCS, Springer, 2020, pp. 3–18.
[22] M. Thiessen, T. Gärtner, Active learning of convex halfspaces on graphs, in: Advances in</p>
      <p>Neural Information Processing Systems, 2021.
[23] W. W. Zachary, An information flow model for conflict and fission in small groups, Journal
of Anthropological Research 33 (1977) 452–473.
[24] D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing,
Association for Computing Machinery, 1993.
[25] D. Lusseau, The emergent properties of a dolphin social network, Proceedings of the</p>
      <p>Royal Society of London. Series B: Biological Sciences 270 (2003).
[26] I. M. Pelayo, Geodesic Convexity in Graphs, Springer New York, 2013.
[27] W. Piotrowski, A generalization of branch weight centroids, Appl. Math. 19 (1987) 541–545.
[28] W. Piotrowski, M. M. Syslo, Some properties of graph centroids, Ann. Oper. Res. 33 (1991)
227–236.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Newman</surname>
          </string-name>
          ,
          <source>Networks: An Introduction</source>
          , Oxford University Press, Inc., USA,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Morris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Kriege</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kersting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mutzel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <article-title>Tudataset: A collection of benchmark datasets for learning with graphs</article-title>
          ,
          <source>in: ICML, GRL Workshop</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. O.</given-names>
            <surname>Cerdeira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <article-title>A centrality notion for graphs based on tukey depth</article-title>
          ,
          <source>Appl. Math. Comput</source>
          .
          <volume>409</volume>
          (
          <year>2021</year>
          )
          <fpage>126409</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Donoho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gasko</surname>
          </string-name>
          ,
          <source>Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness</source>
          ,
          <source>The Annals of Statistics</source>
          <volume>20</volume>
          (
          <year>1992</year>
          )
          <fpage>1803</fpage>
          -
          <lpage>1827</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>