<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Performance for a Given Downstream Task?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pavel Procházka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michal Mareš</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marek Dědič</string-name>
          <email>marek@dedic.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cisco Systems, Inc.</institution>
          ,
          <addr-line>Karlovo náměstí 10, Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Czech Technical University in Prague</institution>
          ,
          <addr-line>Technická 2, Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Czech Technical University in Prague</institution>
          ,
          <addr-line>Trojanova 13, Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>In our work, we investigate the problem of creating a</institution>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>To solve the problem of predicting the suitability of a</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Machine learning algorithms on graphs, in particular graph neural networks, became a popular framework for solving various tasks on graphs, attracting significant interest in the research community in recent years. As presented, however, these algorithms usually assume that the input graph is fixed and well-defined and do not consider the problem of constructing the graph for a given practical task. This work proposes a methodical way of linking graph properties with the performance of a GNN solving a given task on such graph via a surrogate regression model that is trained to predict the performance of the GNN from the properties of the graph dataset. Furthermore, the GNN model hyper-parameters are optionally added as additional features of the surrogate model and it is shown that this technique can be used to solve the practical problem of hyper-parameter tuning. We experimentally evaluate the importance of graph properties as features of the surrogate model with regards to the node classification task for several common graph datasets and discuss how these results can be used for graph composition tailored to the given task. Finally, our experiments indicate a significant gain in the proposed hyper-parameter tuning method compared to the reference grid-search method. Graph neural networks &amp; Model performance prediction &amp; Hyper-parameter tuning &amp; Node classification or a more general similarity or distance measure, both the nodes and edges may have associated with them additional features, the full dataset may be prohibitively large to eficiently process and some data points may not resent the graph dataset by its properties (Section 2.2) work is to extract information about the usefulness of individual graph properties from a meta-model that is trained on the graph dataset properties to predict the</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
Across a wide variety of applications and domains, graphs
emerge as a ubiquitous way of organizing data.
Consequently, machine learning on graphs has, in recent years,
seen an explosion in popularity, breadth and depth of
both research and applications. At the same time, the
underlying graph topology has, until recent works [1, 2],
received much less attention. Specifically, the
organization of data points into nodes and edges of a graph
is usually assumed to be given, unambiguous, and
welldefined, especially in works utilizing common, publicly
available graph datasets that have a pre-defined topology.
ered beneficial in simplifying the comparison of various
graph-based methods, in many practical applications, the
mapping from data to graphs is a non-trivial and open
problem. An example of such an application domain is
computer network security, where a graph
representation of network telemetry may contain entities of various
∗Corresponding author
LGOBE
0000-0003-1021-8428 (M. Dědič)
GNN performance represented by the performance
metrics. In the reported research, we propose and evaluate
While in the research environment this may be consid- representation to a given task.
types (users, servers, IP addresses), the edges may repre- that are aggregate values representing the whole graph
sent either a physical connection between two entities
dataset instead of individual nodes or edges. A GNN
model [3] is trained to solve the task, and its performance
ITAT’23: Information technologies – Applications and Theory, Septem- is measured using several metrics. The main aim of this
ever, it forms a useful and general tool for evaluating the
suitability of graph datasets to tasks on them, which is a
basic prerequisite to solving the more general problem
of constructing an advantageous graph representation.
1.2. Related work
• We experimentally verify the generalization
ca</p>
      <p>pability of the meta-model.
• We evaluate the importance of graph properties</p>
      <p>and their impact on GNN performance.
• We experimentally validate the hyper-parameter
tuning approach with very promising results.</p>
      <p>Machine learning model performance prediction is
commonly used to avoid the expensive evaluation of the orig- 2. Graph representation for GNN
inal model on the test set [4]. However, the problem of performance prediction
trust in these meta-models limits their applicability in
real-world scenarios. To address this problem, the au- 2.1. Notation and definitions
thors in [5] propose attaching prediction uncertainty to
the meta-models and suggest a method for evaluating Consider an undirected graph  = ( , , X) with nodes
this uncertainty. In [6], the authors observe that state-  , edges  ⊆  2, and real-valued node features X ∈ ℝ× ,
of-the-art shift detection metrics (referred to as graph where  = | |. In this work, we limit the definition of a
properties in our paper) do not generalize well across graph task to be one of transductive node classification ,
datasets, and they propose incorporating error predic- however, the method as defined is general and can be
tors. In this paper, we address both the trust and gen- applied to other tasks such as inductive node classification
eralization problems. The novelty of this paper lies in or link prediction. In the transductive setting, a task on
our use of the meta-model: firstly, for interpreting the graph  can be viewed as an assignment of node labels
graph properties that drive the model’s performance, and  (belonging to one of   classes) to the graph. Using a
secondly, for hyper-parameter tuning. To the best of our model M, a prediction  =̂ M ( ) is obtained for the task,
knowledge, there is no existing use of the meta-model and compared to the ground truth using a performance
for these purposes in the current state of the art. metric (  , ̂ ) .</p>
      <p>Graph theory encompasses various numeric graph
properties, ranging from basic ones such as the num- 2.2. Graph representation
ber of nodes, to more sophisticated metrics like graph
curvature [1]. In this paper, we select a subset of these Our goal is to find a set of graph dataset properties P such
metrics, listed in Table 1, as features for the meta-model. that those properties would only keep global-level
infor</p>
      <p>Graph Neural Networks (GNNs) [3] achieve superior mation about the graph  and at the same time provide
performance on graph datasets. However, this perfor- as much information as possible about the performance
mance often comes at the cost of high computational  obtainable on  .
resources required for training. Additionally, the large We ofer a range of graph dataset properties (see
Taconfiguration space of these models necessitates non- ble 1). We categorize these properties into three types
trivial resources for fine-tuning. Our research aims to of information. Specifically, these properties can
conreduce the required computational resources in two ways. vey information regarding: 1) node attributes, 2) graph
Firstly, we attempt to construct a graph dataset from the structure, 3) specified task, or any combination thereof
source data with favorable properties for GNN execution. (awareness in Table 1).</p>
      <p>Secondly, the proposed hyper-parameter search aims to Apart from basic graph properties and well-established
reduce computational resources during fine-tuning. metrics on graphs, we consider some additional graph</p>
      <p>Shapley Additive Explanations (SHAP) [7] is a frame- properties for better description. In order to define these
work for explaining predictions of any model based on additional non-standard properties formally, we denote
coalition game theory concepts introduced in [8]. An   ⊆  the set of nodes belonging to the class  and |  |
additional benefit of this framework is its ability to see its size. The mean attribute vector over the class  is then
whether low or high values of the input variables con- given as  ̄ = | 1 | ∑∈    , where   denotes the attribute
tribute to low/high predictions of the model. In this paper, vector of the corresponding node  . Finally, we define a
we adopt the SHAP framework for model explanation. mean squared distance between attributes in class  1 and
mean of attributes in  2 (attribute similarity) as
1.3. Contribution
• We propose a method to identify important graph</p>
      <p>dataset properties using the meta-model.
• We present a hyper-parameter tuning method
based on the meta-model.
  ( 1,  2) =</p>
      <p>1
|  1| ∈∑ 1(  −  ̄ 2)2.</p>
      <p>
        (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
This asymmetric quantity is used to express similarity
between attributes based on the task (see Table 1).
      </p>
    </sec>
    <sec id="sec-2">
      <title>Task</title>
      <p>No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
No
No
No
Yes
No
No
No
No
No
No
No
No
Yes
Yes
Yes</p>
    </sec>
    <sec id="sec-3">
      <title>Number of nodes – dataset size.</title>
    </sec>
    <sec id="sec-4">
      <title>Ratio between the number of positive and negative nodes.</title>
    </sec>
    <sec id="sec-5">
      <title>Number of connected components of the graph.</title>
    </sec>
    <sec id="sec-6">
      <title>Average node degree in the graph.</title>
    </sec>
    <sec id="sec-7">
      <title>Measure of the tendency of nodes to connect with other</title>
      <p>similar nodes, rather than dissimilar nodes [9].</p>
    </sec>
    <sec id="sec-8">
      <title>Average cosine similarity of attributes across all edges in the graph.</title>
    </sec>
    <sec id="sec-9">
      <title>Measure of how clustered together are nodes with similar attributes [10].</title>
    </sec>
    <sec id="sec-10">
      <title>Fraction of edges connecting nodes of the same class [11].</title>
    </sec>
    <sec id="sec-11">
      <title>Fraction of node neighbours having the same class as the</title>
      <p>node in question, averaged over all nodes [12].</p>
    </sec>
    <sec id="sec-12">
      <title>A modification of node homophily that is invariant to the number of classes [13].</title>
    </sec>
    <sec id="sec-13">
      <title>Ratio of positive nodes with degree greater than one.</title>
    </sec>
    <sec id="sec-14">
      <title>The fraction of positive nodes with degree greater than</title>
      <p>two, out of those with degree greater than one.</p>
    </sec>
    <sec id="sec-15">
      <title>Average node degree in the sub-graph restricted to nodes</title>
      <p>from  1.</p>
    </sec>
    <sec id="sec-16">
      <title>Number of edges connecting positive nodes, divided by</title>
      <p>
        the number of edges that would be present in a theoretical
clique constructed of all positive nodes.
  (
        <xref ref-type="bibr" rid="ref1 ref1">1, 1</xref>
        )– see Equation (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ).
  (
        <xref ref-type="bibr" rid="ref1">1, 0</xref>
        )/  (
        <xref ref-type="bibr" rid="ref1 ref1">1, 1</xref>
        )– see Equation (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ).
  (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        )/  (
        <xref ref-type="bibr" rid="ref1 ref1">1, 1</xref>
        )– see Equation (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ).
      </p>
      <sec id="sec-16-1">
        <title>Node count</title>
      </sec>
      <sec id="sec-16-2">
        <title>Class ratio</title>
      </sec>
      <sec id="sec-16-3">
        <title>Number of components</title>
      </sec>
      <sec id="sec-16-4">
        <title>Average node degree</title>
      </sec>
      <sec id="sec-16-5">
        <title>Global assortativity</title>
      </sec>
      <sec id="sec-16-6">
        <title>Attribute similarity</title>
      </sec>
      <sec id="sec-16-7">
        <title>Attribute homophily</title>
      </sec>
      <sec id="sec-16-8">
        <title>Edge homophily</title>
      </sec>
      <sec id="sec-16-9">
        <title>Node homophily</title>
      </sec>
      <sec id="sec-16-10">
        <title>Class homophily</title>
      </sec>
      <sec id="sec-16-11">
        <title>Ratio of positive nodes of degree &gt; 1</title>
      </sec>
      <sec id="sec-16-12">
        <title>Fraction of positive</title>
        <p>nodes of degree &gt; 2</p>
      </sec>
      <sec id="sec-16-13">
        <title>Average positive node degree</title>
      </sec>
      <sec id="sec-16-14">
        <title>Relative presence of positive edges</title>
      </sec>
      <sec id="sec-16-15">
        <title>Positive attribute similarity</title>
      </sec>
      <sec id="sec-16-16">
        <title>Positive to negative at</title>
        <p>tribute similarity</p>
      </sec>
      <sec id="sec-16-17">
        <title>Negative to positive attribute similarity</title>
        <p>No
Yes
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes</p>
        <p>Yes
2.3. GNN performance prediction
2.5. Measuring graph property usefulness
2.4. Multiple binary classification
Based on the graph dataset properties, we consider a
meta-model Mmeta, which makes a prediction  ̂ of the
true performance  based on the properties {P }.</p>
        <p>To train and evaluate the meta-model, a suficient dataset
is needed. Given that the individual points in this dataset
themselves correspond to graph-task pairs and models
trained on them, obtaining such a dataset for the
metamodel is computationally expensive. To aid with its
creation, only binary classification tasks were considered,
where for datasets with more than 2 classes, multiple
tasks were constructed by taking one class as positive
and other classes as negative, for each class in the original
dataset. This procedure has its motivation in applications, 2.6. Hyper-parameter optimization
where e.g. in the domain of computer security, a classifier
distinguishing each particular kind of malware is a useful In order to apply the meta-model to hyper-parameter
addition to a general malware classifier. optimization, the hyper-parameter values are added to
We train a regression meta-model on a dataset consisting
of graph properties (features of the meta-model) and the
corresponding GNN performance metric (label for the
meta-model). If the regression model generalizes well,
we consider the graph properties that are important for the
meta-model prediction to also be important for the GNN
performance on a graph with the given properties.</p>
        <p>By evaluating the meta-model’s performance on the
test set and determining the important features of the
meta-model (e.g., using SHAP), we propose applying the
meta-model explanation to determine the impact of
individual graph properties on the GNN performance. The
validity of this claim is assessed through the meta-model’s
performance on the test set.
basic graph
properties
task specific
properties</p>
      </sec>
    </sec>
    <sec id="sec-17">
      <title>GNN hyper</title>
      <p>parameters</p>
      <p>GNN
performance
task 1
task 2
task 3
setup 1
setup 2
setup M
2 6 1/3 1 0.3 1 0.4 0.6
2 6 1/3 0.3 0.1 3 0.3 0.6
3. Experimental evaluation
3.1. Experiment description</p>
    </sec>
    <sec id="sec-18">
      <title>Dataset</title>
      <p>ArXiv [14]</p>
    </sec>
    <sec id="sec-19">
      <title>Flickr [15]</title>
    </sec>
    <sec id="sec-20">
      <title>Computers [16]</title>
    </sec>
    <sec id="sec-21">
      <title>Pubmed [17]</title>
      <p>DBLP [18]</p>
    </sec>
    <sec id="sec-22">
      <title>Squirrel [19] Cora [17] #Classes 40</title>
      <p>10
7
3
4
5
70
#Nodes</p>
      <p>(0) =   being the feature vector corresponding
to the node  ∈  . The parameters of the design space
the design space [20] to run the GNN for each dataset. are described in Table 3 and N () denotes the 1-hop</p>
      <p>Train: RMSE = 1.32e-02, Corr = 0.99 Test: RMSE = 6.10e-02, Corr = 0.92
0.4
s0.3
s
o
log0.2
L
e
ruT0.1
0.0
s
s
o
l
g
o
L
e
u
r
T
0.0</p>
      <p>ArXiv
Computers
CoraFull
DBLP
Flickr</p>
      <sec id="sec-22-1">
        <title>P0u.6bMed</title>
        <p>Squirrel</p>
        <p>The tuple (basic graph properties, target class specific
properties, hyper-parameters, performance measured on
the test nodes) constitutes a datapoint in the final dataset
(see Figure 1).</p>
        <p>Our meta-model is a random forest regression model
with 100 trees with mean squared optimisation criterion
with at most 30% of features considered at each split.
3.2. GNN performance prediction based
on graph properties
Given a dataset described in Section 3.1, we train a
regression meta-model predicting the performance metric
using the graph properties. In this experiment, we
consider a selection of the best performance over all available
hyper-parameter settings for each graph and target class,
where only graph properties are used for training.</p>
        <p>The dataset was split randomly to a training and
testing subsets of sizes 93, respectively 46 data-points. The
meta-model was optimised to minimize MSE between
the model prediction and true performance of GNN.</p>
        <p>We evaluated Spearman correlation of the considered
graph properties with the target metrics within the dataset
(see Figure 2). The results show very high correlation
between the log loss and the Brier Score and also high
correlation between the ROC AUC and precision at 10%
of positive nodes. Additionally, graph properties
correlate better with log loss and Brier score, indicating better
performance in predicting these metrics (which is later
confirmed in the experiments).</p>
        <p>The results of our meta-model predicting the true log
loss of the GNN are shown in Figure 3. We can see quite
decent performance on the testing set. Although we do
not use this performance directly for any task, it provides
us with the important information that the meta-model
does not just memorize the training set and indeed uses
the graph properties to model the true performance of
the GNN. Based on this generalization ability, we claim
that graph properties are driving the decision of both the
meta-model as well as the underlying GNN (Section 2.5).</p>
        <p>Based on SHAP explanation of the meta-model, we
evaluated how individual graph properties afect the final
GNN performance (see results in Figures 4 and 5 for
Log Loss and ROC AUC). We can observe that the most
important graph properties difer for each task.</p>
        <p>As expected, a higher homophily (all of its variants)
contributes to better performance. Interestingly, a higher
class ratio leads to better performance in ROC-AUC
prediction but worsens the performance in log-loss
prediction. Although these observations are very interesting
and essentially answer our research question, we should
also consider the limitations of these results. Firstly, their
validity is conditioned by the validity of our hypothesis
assuming that the explanation of the meta-model holds
for the task itself. Secondly, as the graph properties are
not independent of each other (see Figure 2), impact of
one particular property can be reflected in importance
of multiple correlated properties. We leave deeper
investigation of these limitations and their impact for future
work.
3.3. Hyper-parameter optimization
In this experiment, we use the dataset generation method
described in Figure 1 for each graph from Table 2. We
split randomly test and train sets with a ratio  so that the
training set size is given by round(   ), where   is the
dataset size. We provide 100 realizations of this split for
each ratio  . In each realization, we train the meta-model
on the training set and calculate predictions on the test
set. We consider the performance based on the following
hyper-parameter selection procedures:
• Reference (random search): We select the best
performance on the training set plus one sample
from the test set to ensure fair comparison.
• Ours: We find the hyper-parameter setup
achieving the best performance prediction on the test
set, evaluate the true corresponding performance,
and select the best performance on the training
set along with the evaluated one.
• Optimum: We select the best performance from
both the test and training sets.
• Ours - Cross-datasets: Similar to “ours” method,
but we consider all graph datasets except the
evaluated one for training.</p>
        <p>The mean of the resulting performance over
realisations is reported in Figure 6. In addition to the
aforementioned hyper-parameter selection procedures, we
consider one more reference (best hyper-parameter) by
selecting a single hyper-parameter setup with the best
average performance over all binary tasks for each dataset,
ensuring that the model does more than just learn the
best setup for a specific dataset.</p>
      </sec>
      <sec id="sec-22-2">
        <title>Node homophily</title>
      </sec>
      <sec id="sec-22-3">
        <title>Edge homophily</title>
      </sec>
      <sec id="sec-22-4">
        <title>Class ratio</title>
      </sec>
      <sec id="sec-22-5">
        <title>Relative presence of positive edges</title>
      </sec>
      <sec id="sec-22-6">
        <title>Positive to negative attribute covariance</title>
      </sec>
      <sec id="sec-22-7">
        <title>Average positive node degree</title>
      </sec>
      <sec id="sec-22-8">
        <title>Number of components</title>
      </sec>
      <sec id="sec-22-9">
        <title>Fraction of positive nodes of degree &gt; 2</title>
      </sec>
      <sec id="sec-22-10">
        <title>Positive class attribute variance</title>
      </sec>
      <sec id="sec-22-11">
        <title>Sum of 6 other features</title>
      </sec>
      <sec id="sec-22-12">
        <title>Class homophily</title>
      </sec>
      <sec id="sec-22-13">
        <title>Node count</title>
      </sec>
      <sec id="sec-22-14">
        <title>Positive to negative attribute covariance</title>
      </sec>
      <sec id="sec-22-15">
        <title>Number of components</title>
      </sec>
      <sec id="sec-22-16">
        <title>Average positive node degree</title>
      </sec>
      <sec id="sec-22-17">
        <title>Average node degree</title>
      </sec>
      <sec id="sec-22-18">
        <title>Edge homophily</title>
      </sec>
      <sec id="sec-22-19">
        <title>Positive class attribute variance</title>
      </sec>
      <sec id="sec-22-20">
        <title>Class ratio</title>
      </sec>
      <sec id="sec-22-21">
        <title>Sum of 6 other features</title>
        <p>Low
High
0.02
0.00 0.02 0.04 0.06</p>
      </sec>
      <sec id="sec-22-22">
        <title>SHAP value (impact on model output)</title>
        <p>As we can see, the suggested method (“ours”) outper- simply learn the global solution for all datasets.
forms the reference in almost all cases, resulting in a
significant diference, for example, in the Cora dataset.</p>
        <p>However, the most interesting result is achieved by “ours 4. Conclusion
cross-datasets” method. This method is evidently able to We propose a systematic approach to linking graph
proplearn the optimal parameter setup from the graph proper- erties with corresponding GNN performance using a
simties since it achieves nearly optimal performance across ple meta-model. This meta-model is trained to predict
all datasets. The comparison to the best hyper-parameter the true performance based on the graph properties. We
reference method ensures that the meta-model did not experimentally validated the generalization capability of
100
this meta-model on common datasets in the graph re- International Conference on Pattern Recognition
search community. By interpreting the meta-model’s ex- (ICPR), 2022, pp. 2466–2474. ISSN: 2831-7475.
planations, we identified graph properties that influence [7] S. M. Lundberg, S.-I. Lee, A Unified Approach to
the meta-model’s behavior and claim that this interpre- Interpreting Model Predictions, in: Advances in
tation also applies to the impact on GNN performance. Neural Information Processing Systems, volume 30,
We evaluated these properties and found that they align Curran Associates, Inc., 2017.
with our expectations. [8] L. S. Shapley, Notes on the N-Person Game — II:</p>
        <p>The meta-model predictions were also utilized to solve The Value of an N-Person Game, Technical Report,
the hyper-parameter optimization problem. Leveraging RAND Corporation, 1951.
the fact that the meta-model is computationally cheaper [9] M. E. J. Newman, Mixing patterns in networks,
compared to the GNN, we demonstrated that relying on Physical Review E 67 (2003) 026126. Publisher:
the meta-model’s predictions can lead to superior perfor- American Physical Society.
mance compared to the reference random search method. [10] L. Yang, et al., Diverse Message Passing for
AtSpecifically, when the meta-model incorporates knowl- tribute with Heterophily, in: Advances in Neural
edge from other graph datasets, we achieved almost op- Information Processing Systems, volume 34,
Curtimal performance even without seeing any data points ran Associates, Inc., 2021, pp. 4751–4763.
from the target dataset. This indicates that the model is [11] J. Zhu, et al., Beyond Homophily in Graph
Neucapable of learning solely from the graph properties. ral Networks: Current Limitations and Efective</p>
        <p>The proposed hyper-parameter search method can po- Designs, in: Advances in Neural Information
Protentially be extended beyond graph datasets, where we cessing Systems, volume 33, Curran Associates, Inc.,
train the meta-model on suitable properties of the given online, 2020, pp. 7793–7804.
dataset. However, in this paper, we only scratched the [12] H. Pei, et al., Geom-GCN: Geometric Graph
Consurface of this topic, which warrants further research volutional Networks, 2020. ArXiv:2002.05287 [cs,
and an in-depth survey of available works on hyper- stat].
parameter optimization. In the context of this paper, [13] D. Lim, et al., Large Scale Learning on
Nonwe view it as a validation of the concept of learning a Homophilous Graphs: New Benchmarks and
meta-model based on graph properties. Nonetheless, the Strong Simple Methods, in: Advances in Neural
Inpresented results ofer a practical approach to solving formation Processing Systems, volume 34, Curran
the hyper-parameter search problem for graph datasets. Associates, Inc., online, 2021, pp. 20887–20902.
[14] W. Hu, et al., Open Graph Benchmark:</p>
        <p>Datasets for Machine Learning on Graphs,
References 2021. ArXiv:2005.00687 [cs, stat].
[15] H. Zeng, et al., GraphSAINT: Graph Sampling</p>
        <p>Based Inductive Learning Method, in: International</p>
        <p>Conference on Learning Representations, 2019.
[16] O. Shchur, et al., Pitfalls of Graph Neural Network</p>
        <p>Evaluation, 2019. ArXiv:1811.05868 [cs, stat].
[17] Z. Yang, W. Cohen, R. Salakhudinov, Revisiting</p>
        <p>Semi-Supervised Learning with Graph Embeddings,
in: Proceedings of The 33rd International
Conference on Machine Learning, PMLR, New York, NY,</p>
        <p>USA, 2016, pp. 40–48.
[18] A. Bojchevski, S. Günnemann, Deep Gaussian
Embedding of Graphs: Unsupervised Inductive
Learning via Ranking, in: 6th International Conference
on Learning Representations, 2018.
[19] B. Rozemberczki, C. Allen, R. Sarkar, Multi-Scale
attributed node embedding, Journal of Complex</p>
        <p>Networks 9 (2021) cnab014.
[20] J. You, Z. Ying, J. Leskovec, Design Space for Graph</p>
        <p>Neural Networks, in: Advances in Neural
Information Processing Systems, volume 33, Curran
Associates, Inc., 2020, pp. 17009–17021.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Topping</surname>
          </string-name>
          , et al.,
          <article-title>Understanding over-squashing and bottlenecks on graphs via curvature</article-title>
          ,
          <source>in: The Tenth International Conference on Learning Representations</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Veličković</surname>
          </string-name>
          , Geometric Deep Learning - Grids, Groups, Graphs, Geodesics, and
          <string-name>
            <surname>Gauges</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ying</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>Inductive representation learning on large graphs</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Guerra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Prudêncio</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Ludermir</surname>
          </string-name>
          ,
          <article-title>Predicting the performance of learning algorithms using support vector machines as meta-regressors</article-title>
          ,
          <source>in: Artificial Neural Networks-ICANN</source>
          <year>2008</year>
          : 18th International Conference, Prague, Czech Republic,
          <source>September 3-6</source>
          ,
          <year>2008</year>
          , Proceedings,
          <source>Part I 18</source>
          , Springer,
          <year>2008</year>
          , pp.
          <fpage>523</fpage>
          -
          <lpage>532</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Elder</surname>
          </string-name>
          , et al.,
          <source>Learning Prediction Intervals for Model Performance, Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>35</volume>
          (
          <year>2021</year>
          )
          <fpage>7305</fpage>
          -
          <lpage>7313</lpage>
          . Number:
          <volume>8</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Maggio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bouvier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dreyfus-Schmidt</surname>
          </string-name>
          ,
          <article-title>Performance Prediction Under Dataset Shift</article-title>
          , in:
          <year>2022</year>
          26th
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>