=Paper= {{Paper |id=Vol-2962/paper50 |storemode=property |title=Experimental Investigation of Neural and Weisfeiler-Lehman-Kernel Graph Representations for Downstream Classification |pdfUrl=https://ceur-ws.org/Vol-2962/paper50.pdf |volume=Vol-2962 |authors=Sergej Borisov,Marek Dědič,Martin Holeňa |dblpUrl=https://dblp.org/rec/conf/itat/BorisovDH21 }} ==Experimental Investigation of Neural and Weisfeiler-Lehman-Kernel Graph Representations for Downstream Classification == https://ceur-ws.org/Vol-2962/paper50.pdf
         Experimental Investigation of Neural and Weisfeiler-Lehman-Kernel
          Graph Representations for Downstream SVM-Based Classification

                                        Sergej Borisov1 , Marek Dědič2,3 , and Martin Holeňa4
                        1 Faculty of Mathematics and Physics, Charles University, Prague, borisov@cs.cas.cz
           2   Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University, Prague, marek@dedic.eu
                                       3 Cisco Cognitive Intelligence, Cisco Systems, Inc., Prague
                          4 Institute of Computer Science, Academy of Sciences, Prague, martin@cs.cas.cz


Abstract: Graphs are one of the most ubiquitous kinds                 other kinds of non-numerical data, the best-known exam-
of data. However, data analysis methods have been de-                 ple probably being textual data. The seminal papers about
veloped primarily for numerical data, and to make use                 graph representation [20] and [9], which introduced, re-
of them, graphs need to be represented as elements of                 spectively, the neighbourhood sampling strategies Deep-
some Euclidean space. An increasingly popular way of                  Walk and node2vec, were strongly influenced by the Skip-
representing them in this way are graph neural networks               gram model for text, implemented by the word2vec algo-
(GNNs). Because data analysis applications typically re-              rithm [16, 17].
quire identical results for isomorphic graphs, the repre-                An increasingly popular way of representing graphs by
sentations learned by GNNs also need to be invariant                  vectors is using graph neural networks [10]. Typically,
with respect to graph isomorphism. That motivated re-                 downstream applications are desired to provide identical
cent research into the possibilities of recognizing non-              results for isomorphic graphs. Consequently, the repre-
isomorphic pairs of graphs by GNNs, primarily based on                sentations learned by GNNs need to be invariant with re-
the Weisfeiler-Lehman (WL) isomorphism test. This pa-                 spect to isomorphism. That motivated recent research into
per reports the results of a first experimental comparison of         the possibilities of recognizing non-isomorphic pairs of
four variants of two important GNNs based on the WL test              graphs by GNNs [1, 3, 6, 11, 15, 18, 29], primarily based
from the point of view of graph representation for down-              on the classical WL isomorphism test, capable of reveal-
stream classification by means of a support vector machins            ing such pairs in many situations [27].
(SVM). Those methods are compared not only with each                     The WL-test iteratively constructs neighbourhood sub-
other, but also with a recent generalization of the WL sub-           trees rooted in graph vertices. Such a construction can also
tree kernel. For all GNN variants, two different representa-          be used for graph kernels [22]. In particular [24] intro-
tions are included in the comparison. The comparison re-              duced the WL graph kernels, of which most relevant to our
vealed that the four considered representations of the same           work is the WL subtree kernel. That kernel was recently
kind of GNN never significantly differ. On the other hand,            generalized to the relaxed WL kernel [23]. Whereas the
there was always a statistically significant difference be-           kernel itself evaluates the graph with a scalar value, the
tween representations originating from different kinds of             rooted subtrees used in a kernel definition can be easily
GNNs, as well as between any representation originating               represented with vectors of non-negative numbers. There-
from any of the considered GNNs and the representation                fore, the WL kernels can be viewed as a kernel counterpart
originating from the generalized WL kernel.                           of representation learning with WL-test-based GNNs.
Keywords: graph representation learning, graph neural                    This work-in-progress paper reports a first experimental
networks, message-passing networks, Weisfeiler-Lehman                 comparison of two important WL-test-based GNNs with
isomorphism test, Weisfeiler-Lehman subtree kernel                    the general relaxed WL kernel from the point of view
                                                                      of graph representation for downstream classification by
                                                                      means of an SVM. Of each GNN, four variants were avail-
1     Introduction                                                    able, and for all employed GNN variants, two different
                                                                      representations were included into the comparison. The
In the present time, graph-structured data are one of the             comparison was performed on 20 graphsets created from
most ubiquitous kinds of data. However, from the point                benchmark datasets and its results were tested for statisti-
of view of data analysis and knowledge discovery in                   cal significance.
data, they received attention only during the last decades.              The next section gives an overview of message-passing
Therefore, data analysis methods for common tasks such                neural networks, which are the most common kind of
as classification, regression and clustering have been de-            GNNs, and at the same time the kind to which the known
veloped primarily for numerical data, and if they are used            WL-test-based GNNs belong. It also recalls the principles
for graph-structured data, graphs needs to be represented             of the relaxed WL kernel. The key part of the paper is Sec-
as vectors in some Euclidean space. This requirement is               tion 3, in which the performed experimental comparison is
not specific for graphs, but can also be encountered with             described and its first results are presented. Finally, Sec-
_______________________
Copyright ©2021 for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
tion 4 concludes the paper and indicates possible furhter
research.


2      Methodology                                                    Algorithm 1 Weisfeiler-Lehman isomorphism test
                                                                      Require: Graphs G, G0 with sets of vertices V,V 0 , sets of
2.1     Preliminaries                                                     edges E, E 0 , and colourings c, c0 such that V ∩ V 0 =
                                                                          / #V = #V 0 , maximal number of iterations h ≤ #V
                                                                          0,
In this comparison, we consider undirected graphs with                                                                           0
                                                                       1: Define the colouring c0 : V ∪ V 0 → {0, 1}dc ∪ {0, 1}dc
labelled and colored vertices. More precisely, graphs G =
                                                                          by
(V, E, λ ) such that
                                                                                                 (
    • E ⊆ {X ⊆ V : #X = 2}, where V is the set of vertices,                                        c(v) if v ∈ V
                                                                                         c0 (v) = 0                            (1)
      E is the set of edges, and #X denotes the cardinality                                        c (v) if v ∈ V 0
      of X;
                                                                       2: Set Σ0 = c0 (V ∪V 0 )
    • λ : V → Rdλ denotes a general labelling;                                                       #Σ
                                                                       3: Order Σ0 as σ01 , . . . , σ0 0
                                                                       4: if exists j = 1, . . . , #Σ0 such that #{v ∈ V : c0 (v) =
    • ∃c : V → {0, 1}dc , called colouring, where dc ≤ dλ ,
      ∀v ∈ V : (c(v))1 + · · · + (c(v))dc = 1, and c fulfils c =          σ0j } 6= #{v ∈ V 0 : c0 (v) = σ0j } then
      (λ1 , . . . , λdc ), i.e. the coloring is part of the general    5:      Return the fact that G and G0 are not isomorphic
      labelling, which may in addition contain also other              6: else
      components.                                                      7:      Set i = 1
                                                                                                                         #Σ
                                                                       8:      Define the colouring c1 : V ∪V 0 → Σ0 × N0 0 by
   Using this notation and the symbol N(v) for the neigh-
bours of vertex v in the graph G, the WL isomorphism                         c1 (v) = (c0 (v), (s1 , . . . , s#Σ0 )), where
test [27] can be described. The test consists in an iterative
                                                                             s j = #{u ∈ N(v) : c0 (u) = σ0j } for j = 1, . . . , #Σ0
algorithm applied to pairs G, G0 of graphs with equal num-
                                                                                                                                     (2)
bers of vertices and such that if G and G0 are isomorphic,
then the isomporphism preserves their colorings. In each               9:      Set Σ1 = c1 (V ∪V 0 )
iteration, the colorings are updated, taking into account the
                                                                      10:      Order Σ1 as σ11 , . . . , σ1#Σ1
neighbours of all vertices, in such a way that an isomor-
                                                                      11: end if
phism preserves also the updated colorings. Hence, if af-                                                         j
                                                                      12: while i < h and #{v ∈ V : ci (v) = σi } = #{v ∈ V 0 :
ter a finite number of iterations, some value of the updated                         j
coloring occurs in both graphs with different frequency,                  ci (v) = σi } for j = 1, . . . , #Σi do
then the isomorphism of G and G0 can be rejected. How-                13:      Increment i = i + 1
                                                                                                                           #Σ
ever, it can never be definitely confirmed. The pseudocode            14:      Define the colouring ci : V ∪ V 0 → Σi−1 × N0 i−1
of the WL test is in Algorithm 1.                                         by
   Denote AG and A0G the adjacency matrices of G and G0 ,
respectively, and 1#V the vector (1, . . . , 1) of length #V .               ci (v) = (ci−1 (v), (s1 , . . . , s#Σi−1 )), where
Then the situation that the WL isomorphism test cannot                       s j = #{u ∈ N(v) : c0 (u) = σ0j } for j = 1, . . . , #Σi
reject isomorphism of G and G0 can be characterized by                                                                              (3)
the following three equivalent conditions [5]:
(i) the WL test does not reject the isomorphism of G and              15:      Set Σi = ci (V ∪V 0 )
      G0 within #V iterations;                                        16:      Order Σi as σi1 , . . . , σi#Σi
(ii) for each tree T , the number of homomorphisms from               17: end while
      T to G equals the number of homomorphisms from T                18: if exists j = 1, . . . , #Σi such that #{v ∈ V : ci (v) =
      to G0 ;                                                             σ0j } 6= #{v ∈ V 0 : ci (v) = σ0j } then
(iii) there exists a matrix X ∈ 0, 1#V ×#V solving the fol-           19:      Return the information that G and G0 are not iso-
      lowing system of linear equations:                                  morphic
                                                                      20: else
                             AG X = XA0G ,                            21:      Return the information that the WL test did not
                             X1#V = 1#V ,                      (4)        reject the isomorphism of G and G0 within h iterations
                                                                      22: end if
                             1>       >
                              #V X = 1#V .

Observe that if G and G0 are really isomorphic, then the
equations (4) are solved by the permutation matrix X ful-
filling A0G = X > AG X, which transforms AG into A0G .
   Finally, a feedforward artificial neural network NN will       2.2   Message-passing Neural Networks
for the purpose of this paper be a stationary connectionist
feedforward structure                                             Message-passing neural networks (MPNN) are probably
                                                                  the kind of neural networks most often used for graph
  NN = (I, O, H,C, F , (Φe )e∈C , (Ψν )ν∈H∪O ) where       (5)    data. They are characterized by the following two prop-
                                                                  erties [10]:
  • I, O and H are mutually disjoint sets of input, output
                                                                   1. Their input and hidden neurons are structured into
    and hidden neurons;
                                                                      a sequence of layers H0 = I, H1 , . . . , HL , like in the
                                                                      case of multilayer perceptrons (MLPs) and radial ba-
  • C ⊆ I × H ∪ H × H ∪ H × O is a set of connections;
                                                                      sis functions:
  • ∀ν ∈ I : out(ν) = {ξ ∈ H ∪ O : (ν, ξ ) ∈ C} =
                                                6 0,
                                                  /
    out(ν) is the output set of the neuron ν;                            k 6= ` ⇒ Hk ∩ H` = 0,
                                                                                            /
                                                                                                                 [
                                                                                          C ∩ (H × H) ⊆                H`−1 × H` . (9)
  • ∀ν ∈ O : inp(ν) = {ξ ∈ I ∪ H : (ξ , ν) ∈ C} =
                                                6 0,
                                                  /                                                          1≤`≤L
    inp(ν) is the input set of the neuron ν;
                                                                   2. The structure of their connections as well as their so-
  • ∀ν ∈ H : inp(ν) 6= 0,
                       / out(ν) 6= 0;
                                   /                                  matic and synaptic operators attempts to reflect the
                                                                      structure and properties of the graphs to which the
  • ∃Dom ⊆ R#I : F ⊆ {F : Dom → R#O }, F is the set                   network is applied. The approach that is most typ-
    of mappings computable by the NN;                                 ical, and at the same time perfectly suitable for the
                                                                      kind of graphs considered in this paper, consists in:
  • ∀ν ∈ H ∪ O ∃Domν ⊆ R# inp(ν) : Ψν ⊆ {ψ : Domν →
    R}, the elements of Ψν are called somatic operators                  • network connections attempt to reflect the struc-
    available for the neuron ν;                                            ture of graph edges, in particular input and out-
                                                                           put sets of neurons attempt to reflect the neigh-
  • ∀e ∈ C ∃Dome ⊆ R : Φe ⊆ {ϕ : Dome → R}, the el-                        bourhoods of graph vertices;
    ements of Φe are called synaptic operators available                 • in each layer ` = 0, . . . , L, a group of neurons
    for the connection e;                                                  νv,1 , . . . , νv,d` ∈ H` can be assigned to a vertex
                                                                                                                                   (`)
                                                                            v ∈ V , with d0 = dλ , and the vector ψv =
  • all mappings computable by the NN fulfil the follow-
                                                                            (ψνv,1 , . . . , ψνv,d ) of their somatic operators for
    ing condition of resursive composability:                                                     `
                                                                            a layer ` < L fulfils
    ∀F ∈ F ∀ν ∈ H ∪O ∀ξ ∈ inp(ν) ∃ψν ∈ Ψν ∃ϕ(ξ ,ν) ∈
    Φ(ξ ,ν) ∀X ∈ Dom ∃AX : I ∪H ∪O → R – the activation                          (`+1)        (`)    (`)   (`)   (`)         (`)
                                                                               ψv         = fup (ψv , fag (ψu1 , . . . , ψu#N(v) )),
    state of NN corresponding to the input X, fulfilling
                                                                                                                                  (10)
                    AX |I = X, AX |O = F(X),               (6)
                                                                                      (`)
     ∀ν ∈ H ∪ O : AX (ν) = ψν ((ϕ(ξ ,ν) (AX (ξ )))ξ ∈inp(ν) ).              where fup : Rd` × Rdag → Rd`+1 is called up-
                                                                                                                           (`)
                                                          (7)               date function, {u1 , . . . , u#N(v) } = N(v), fag :
                                                                            Rd` ×#N(v) → Rdag is called aggregation func-
                                                                            tion, and dag ∈ N with a usual choice dag = d`
   The above definition covers in particular multilayer per-
                                                                            or dag = d`+1 ;
ceptrons and radial basis functions, for which a plethora of
theoretical results is available, such as the classical univer-          • synaptic operators are not used.
sal approximation results concerning density of F in gen-
eral function spaces [4, 12–14, 19], and results concerning        3. The resulting mapping F composed according to (6)
the applicability of the laws of large numbers and of the             is required to be either invariant or equivariant with
central limit theorem [28]. As usually, the set F will be             respect to permutations of the d0 columns of its input
assumed parametrizable with a finite-dimensional set W of             matrix. Needless to say, equivariance – preserving in
parameters. Hence,                                                    the output the permutation of the input columns, is
                                                                      possible only if |O| = d0 .
  ∃q ∈ N ∃W ⊆ Rq ∃ω : W → {F : Dom → R#O }                           An important concept relevant to MPNN as well as to
                                such that F = ω(W ). (8)          other kinds of GNNs is consistency with graph colouring.
                                                                  An MPNN is consistent with the colouring c of a graph G
Needless to say, the parametrizability of F also induces          if it fulfils
the parametrizability of the sets of somatic operators                              (0)        (0)
Ψv , v ∈ H ∪ O and synaptic operators Φe , e ∈ C.                                ψu = ψv ⇔ c(u) = c(v).                             (11)
                  (`)
   Because (ψv )v∈V is for each ` = 1, . . . , L a mapping                  Let X ⊆ Rd` be countable and bounded, pmax ∈ N. Then
into the Euclidean space Rd` ×#V , or equivalently, into the                there exists a function f : X → Rd` +1 such that for in-
Euclidean space Rd` #V , it can be used for graph representa-               finitely many choices of ε,S including all irrational num-
                                  (L)
tion. Usually, however, only (ψv )v∈V is used to this end.                  bers, the function h : X × pp=1
                                                                                                          max
                                                                                                              X p → X defined
In our experiments reported in Section 3, both possibilities
were used.                                                                    ∀c ∈ X ∀p = 1, . . . , pmax ∀X = (x1 , . . . , x p ) ∈ X p :
                                                                                                                                             p
Network 1-GNN was proposed in [18] as part of a study                                               h(c, X) = (1 + ε) f (c) + ∑ f (x j ) (14)
                                                                                                                                             j=1
of the relationships between GNNs and the WL test. That
study actually considered more general GNNs, called k-                      is unique with respect to multisets, which means that
GNNs, with a general k ∈ N, and investigated their rela-
tionships to k-WL tests, which are generalizations of the                     ∀c ∈ X ∀p = 1, . . . , pmax ∀X, X 0 ∈ X p :
WL test from vertices to k-tuples of vertices. The 1-GNN
                                                                                           [∀x ∈ X : #{ j = 1, . . . , p : x j = x} =
is not only the simplest network of this kind, but at the
same time also a specific kind of MPNN, for which the                         = #{ j = 1, . . . , p : x0j = x}] ⇒ h(c, X) = h(c, X 0 ). (15)
             (`)     (`)
functions fup and fag in (10) are defined as
                                                                            2.3      Relaxed Weisfeiler-Lehman Kernel
    (`)  (`) (`)   (`)           (`)
   fup (ψv , fag (ψu1 , . . . , ψu#N(v) )) =                                The relaxed WL kernel [23] is a recent relaxation of the
             (`) (`)        (`)  (`)           (`)
       = σ (W1 ψv + b(`) + fag (ψu1 , . . . , ψu#N(v) )),                   WL subtree kernel, which was proposed in [24] and is
                                                                            based on the WL isomorphism test described in Algo-
                                                     #N(v)
                  (`)     (`)         (`)                      (`)    (`)   rithm 1. Taking into account the number of iterations of
                 fag (ψu1 , . . . , ψu#N(v) ) = ∑ W2 ψu j , (12)            the WL test, which is in the context of the WL subtree
                                                      j=1
                                                                            kernel called depth, the value of this kernel for a pair of
          (`)      (`)                                                      graphs G = (V, E, λ ), G0 = (V 0 , E 0 , λ 0 ) is defined as
where W1 ,W2 ∈ Rd`+1 × Rd` , b(`) ∈ Rd`+1 , and σ is a
component-wise non-linear activation function, e.g. a sig-                                                    h
                                                                                   (h)
moid or ReLU.                                                                     kWLsubtree (G, G0 ) = ∑ ∑ ∑ δ (ci (v), ci (v0 )),                (16)
   From the point of view of a relationship between this                                                     i=0 v∈V v0 ∈V 0

kind of networks and the WL test, it is important that                      where δ denotes the Kronecker delta, also known as Dirac
for a 1-GNN consistent with the graph colouring, and for                    kernel. Consequently, the WL subtree kernel reflects only
the colouring ci constructed in the i-th iteration of Algo-                 exact match of the colouring produced for both graphs in
rithm 1, i ≤ min(h, L), it was proven in [18] that:                         every iteration of the WL test, although in many real-world
                                           (i)     (i)
(i) ∀u, v ∈ V : ci (u) = ci (v) ⇒ ψu = ψv ;                                 problems, more important than exact match is a similarity
(ii) there exists a sequence of weight matrices and bias                    of those colourings.
                       (0)   (0)               (i−1)   (i−1) (i−1)
     vectors (W1 ,W2 , b(0) ), . . . , (W1           ,W2     ,b    )            To overcome this drawback, the exact match is in [23]
                                        (0) (0)          (i−1) (i−1)
     such that if the functions fup , fag , . . . , fup , fag               for i = 0, . . . , h weakened to the equivalence with respect
                                                                                                 ρ       ρ
     are defined using this sequence, then for any i0 =                     to the clusters C1 , . . . ,Ckρ produced by a partitioning ρ of
                                (i0 ) (i0 )                                 Σi . Hence,
     1, . . . , i, u, v ∈ V : ψu = ψv ⇒ c(i0 ) (u) = c(i0 ) (v).
                                                                                                                      ρ        ρ
                                                                              ∀i, j = 1, . . . , kρ : i 6= j ⇒ Ci ∩C j = 0,
                                                                                                                         /
Graph Isomorphism Network (GIN) was proposed in [29]
                           (`)     (`)                                                     ρ             ρ
and defines the functions fup and fag in (10) as                                         C1 ∪ · · · ∪Ckρ = Σi , ρ : Σi → {1, . . . , kρ },
                                                                                                                                   ρ
                                                                                                ∀k = 1, . . . , kρ , ∀σ ∈ Ck : ρ(σ ) = k. (17)
    (`)  (`) (`)   (`)           (`)
   fup (ψv , fag (ψu1 , . . . , ψu#N(v) )) =
                                                                            Moreover, for each Σi , not only one such partitioning ρ
           (`)                  (`)         (`)     (`)         (`)
       = fmlp ((1 + ε` ))ψv + fag (ψu1 , . . . , ψu#N(v) )),                is used, but a finite set Θi = {ρ1i , . . . , ρ#Θ
                                                                                                                           i } of them. This
                                                                                                                              i
                                                             #N(v)          turns (16) finally into the definition of a relaxed WL ker-
                          (`)   (`)           (`)                     (`)   nel:
                         fag (ψu1 , . . . , ψu#N(v) ) = ∑ ψu j , (13)
                                                              j=1                                    h
                                                                               (h)
                          (`)
                                                                             kR-WL (G, G0 ) = ∑ ∑ ∑ ∑ δ (ρ(ci (v)), ρ(ci (v0 ))).
where ε` > 0 and fmlp : Rd` → Rd` +1 is an MLP producing                                            i=0 ρ∈Θi v∈V v0 ∈V 0
the representation of the graph in the (` + 1)st layer.                                                                                            (18)
   A relationship between GIN and the WL test is based
                                                                                                                                       (h)
on applying the universal approximation resultsfor MLPs                     Its name originates from the fact that kR-WL is more gen-
                (`)                                                                     (h)
[4, 12–14] to fmlp , in combination with a result proven                    eral than kWLsubtree , to which it turns if Θi = {ρi } with
in [29]as Corollary 6:                                                      ρi (σ ) = {σ } for i = 1, . . . , h, σ ∈ Σi .
   The construction of the partitionings ρ ∈ Θi , i = 1, . . . , h,                      3      Experimental Comparison
in (18) is based on replacing each σ ∈ Σi for i = 1, . . . , h
with a set of isomorphic unfolding trees. An unfolding                                   For the experimental comparison, we implemented the 1-
tree T i (G, v) of depth i in a graph G rooted in a vertex                               GNN and GIN networks using layers and neural networks
v ∈ V will be defined in accordance with [5], as a rooted                                learning methods available in the PyTorch Geometric li-
tree T = (V (T ), E(T )) with root r such that:                                          brary [21]. We also made use of the GenWL implemen-
                                                                                         tation [8] of the relaxed WL kernel by the authors of the
   • ∃ f – homomorphism from T to G;                                                     paper [23]. Both network implementations were used in
   • f (r) = v;                                                                          their decay variants, and the GenWL implementation was
                                                                                         used in the R-WL∗ variant, in accordance with the experi-
   • for each non-leaf t ∈ V (T ), f induces a bijection be-                             ments reported in [23].
     tween the set of children of t in T and N(v) in G.
For unfolding trees, a natural distance is a tree-edit dis-                              3.1     Employed Graphsets
tance. The definition of the tree-edit distance employed in
                                                                                         To be able to asses the statistical significance of the com-
the context of the relaxed WL kernel can be found in [23].
                                                                                         parison results, we performed the comparison on 20 mu-
   From the point of view of graph representation, it is im-
                                                                                         tually disjoint graphsets GS1–GS20, adopting the usual
portant that (16) and (18) are actually scalar products of
                                                                                         assumption that for the employed data, disjointness is a
vectors, which suggests to use those vectors as Euclidean-
                                                                                         sufficient condition for statistical independence. Those
space representations of the graphs G and G0 in down-
                                                                                         graphsets were obtained from real-world benchmark sets
stream data analysis applications. In particular for the
                                                                                         of graph data. Due to our objective of investigating the
WL subtree kernel, define a vector φ WL (G) ∈ RnWL with
                                                                                         suitability of the compared graph representation methods
nWL = ∑hi=0 #Σi as
                                                                                         for downstream classification, we used datasets from bi-
     φ WL (G) = (φ0,1
                  WL             WL
                      , . . . , φ0,#Σ0
                                          WL
                                       , φ1,1            WL
                                              , . . . , φh,#Σh
                                                               ),                 (19)   nary graph classification tasks to this end.
                                                                                            We have chosen four benchmark sets of graph data,
where for i = 0, . . . , h, j = 1, . . . , #Σi ,                                         which are available both in the PyTorch Geometric library
                   φi,WL                        i                                        [21], and in the GenWL implementation of the relaxed WL
                       j = #{v ∈ V : ci (v) = σ j }.                              (20)
                                                                                         kernel [8].
Then (16) can be indeed rewritten as the scalar product
                                                                                             1. BZR is a set of 405 ligands for the benzodiazepine re-
             (h)
            kWLsubtree (G, G0 ) = φ WL (G)> φ WL (G0 ).                           (21)          ceptor, classified with respect to their activity in ben-
                                                                                                zodiazepine binding [25].
Similarly for the relaxed WL kernel, define a vector
φ R-WL (G) ∈ RnR-WL with nR-WL = ∑hi=0 ∑#Θ i                                                 2. COX-2 is a set of 467 cyclooxygenase-2 inhibitors,
                                        j=1 kρ i as                     j
                                                                                                classified with respect to their activity against human
                                                                                                recombination enzyme [25].
   φ R-WL (G) = (φ0,1
                  R-WL            R-WL R-WL
                       , . . . , φ0,#Σ 0
                                                             R-WL
                                         , φ1,1,1 , . . . , φ1,1,k 1
                                                                     ,
                                                                              ρ1
                                                                                             3. DHFR is a set of 756 inhibitors of dihydrofolate re-
     R-WL             R-WL               R-WL             R-WL
    φ1,2,1 , . . . , φ1,#Θ 1 ,kρ 1
                                      , φ2,1,1 , . . . , φh,#Θ h ,k h
                                                                            ), (22)             ductase, classified with respect to their activity in the
                                #Θ1                               ρ#Θ
                                                                     h
                                                                                                inhibition of the enzymatic reduction that converts di-
where for i = 0, . . . , h, j = 1, . . . , #Gi , k = 1, . . . , kρ i ,                          hydrofolate to tetrahydrofolate [25].
                                                                              j


               φi,R-WL               i                                                       4. NC1 is a set of 4110 compounds evaluated in bioas-
                   j,k = #{v ∈ V : ρ j (ci (v)) = k}.                             (23)
                                                                                                says of the National Cancer Institute on non-small
Then (18) can be rewritten as the scalar product                                                cells of lung tumour, classified with respect to growth
             (h)
                                                                                                inhibition of this kind of human tumour [26].
            kR-WL (G, G0 ) = φ R-WL (G)> φ R-WL (G0 ).                            (24)
                                                                                         The precise origin of the graphsets GS1–GS20 in those
   It is useful to realize that the label sets Σi for i = 0, . . . , h                   four benchmark sets is listed in Table 1.
on which the definitions of φ WL and φ R-WL rely, were de-
fined only for a pair of graphs (G, G0 ), as Σi = ci (V ∪V 0 ),
                                                                                         3.2     Experimental Setup
cf. Line 15 od Algorithm 1. For dealing with a whole
set G of graphs such that each G ∈ G has its own sets of                                 Before starting the experiments, we need two make two
vertices V G and edges E G as well as its own labelling λ G                              decisions concerning the compared GNNs: First, what net-
and colouring cG , then it is convenient to generalize their                             work topology to use, and how to obtain a graph represen-
definition to                                                                            tation from a trained network. Second, a decision concern-
                            [                                                            ing the comparison as a whole, including the relaxed WL
                   Σi =          cG   G
                                  i (v ), i = 0, . . . , h.                       (25)   kernel – how to evaluate the suitability of each representa-
                           G∈G
                                                                                         tion for downstream classification. We now address each
It is this definition that we used in our experiments.                                   of those design decisions in some detail.
                                                                    average pooling layer during their training. According to
        Table 1: Origin of the employed graphsets.
                                                                    the neurons that were actually used to this end, two kinds
                                                                    of graph representation were considered:
   Graphset              Original benchmark data
                                                                    (i) representation restricted to activities of neurons of the
   GS1           BZR,      graphs with nr.     1-200                     last MPNN layer, which is a restriction commonly en-
   GS2           BZR,      graphs with nr. 201-405                       countered in representation learning by artificial neu-
   GS3           COX2,     graphs with nr.     1-250                     ral networks;
   GS4           COX2,     graphs with nr. 251-467                  (ii) representation with activities of neurons of all MPNN
                                                                         layers, which is more similar to the representation
   GS5           DHFR,     graphs with nr.     1-250
                                                                         (24) based on the relaxed WL kernel.
   GS6           DHFR,     graphs with nr. 251-500                     Apart from the GNN topology, the hyperparameter val-
   GS7           DHFR,     graphs with nr. 501-756                  ues of all compared methods were set to their defaults in
   GS8           NCI1,     graphs with nr. mod 13 = 0               the employed implementation, and if no default was avail-
   GS9           NCI1,     graphs with nr. mod 13 = 1               able, to values obtained through slight tuning.
   GS10          NCI1,     graphs with nr. mod 13 = 2
                                                                    Evaluation of graph representations with respect to their
   GS11          NCI1,     graphs with nr. mod 13 = 3
                                                                    suitability for downstream classification was by means of
   GS12          NCI1,     graphs with nr. mod 13 = 4               accuracy on test data obtained in classification using as
   GS13          NCI1,     graphs with nr. mod 13 = 5               input each of the representations. To this end, a linear
   GS14          NCI1,     graphs with nr. mod 13 = 6               support vector machine (SVM) classifier was employed,
   GS15          NCI1,     graphs with nr. mod 13 = 7               in accordance with the default setting in the GenWL im-
   GS16          NCI1,     graphs with nr. mod 13 = 8               plementation of the relaxed WL kernel [8]. To increase
                                                                    the reliability of the accuracy assessment, a 10-fold cross-
   GS17          NCI1,     graphs with nr. mod 13 = 9
                                                                    validation was used.
   GS18          NCI1,     graphs with nr. mod 13 = 10
   GS19          NCI1,     graphs with nr. mod 13 = 11
                                                                    3.3   First Results
   GS20          NCI1,     graphs with nr. mod 13 = 12
                                                                    The results for both kinds of GNN-based representations
                                                                    for the 4 considered networks, i.e. 1-GNN and GIN with
Topology of the Compared GNNs . Due to the fact that the            both variants (iia) and (iib) of the MPNN layers, as well
GNNs were trained on classification data, and also due to           as the results for the R-WL∗ -based representation are pre-
the default settings of the employed GenWL implementa-              sented in Table 2. For each of those representations, the
tion of the relaxed WL kernel, both of them contained the           mean and standard deviation of the accuracies on the vali-
following layers:                                                   dation folds are reported, from 10-fold cross-validation on
(i) an input layer, which receives the colourings of ver-           each of the 20 graphsets.
      tices, i.e. the components (λ1 , . . . , λdc ) of the label      According to Table 2, the highest accuracy has been
      λ , no matter whether the label has possibly still other      most frequently, namely 8 times among the 20 employed
      components, i.e. whether dc = dλ or dc < dλ ;                 graphsets, achieved with the representation by activities of
(ii) 5 MPNN layers of the same size, specific for 1-GNN             the last layer of the the variant (iia) of MPNN layers of
      and for GIN;                                                  a 1-GNN, as well as with the representation by activities
(iii) an average-pooling layer averaging over vertices of           of all layers of the the variant (iib) of MPNN layers of a
      the graph;                                                    1-GNN. However, the differences between accuracies in
(iv) a crossentropy-classification layer.                           Table 2 are not only due to essential differences between
   As to the MPNN layers, we investigated 2 variants of             the considered representations, but also due to random in-
them, differing in size:                                            fluences. To separate the former from the latter requires to
(iia) The size of all 5 layers equals the average number of         assess statistical significance of those differences.
      vertices among the graphs in the graphset.
                                                                    3.4   Assessment of Statistical Significance
(iib) The size of all 5 layers equals the maximal number
     of vertices among the graphs in the graphset.                  To assess the statistical significance of the obtained results,
                                                                    we first tested the basic null hypotheses that the mean
GNN values used for graph representation were obtained              classification accuracy for all 9 representations coincides.
as the activities, i.e. the results of somatic operators, in        To this end, we applied the Friedman test with 10 repli-
neurons of the MPNN layers. In both compared GNNs, the              cates to the results for all 200 validation folds from the
activities are obtained separately for each vertex of each          cross-validation of SVM-classification on the 20 graph-
graph. Therefore, activities for each graph are first aver-         sets. This basic null hypothesis was strongly rejected, with
aged over all of its vertices, in accordance with using an          the achieved significance p = 4 ∗ 10−92 . For the post-hoc
Table 2: Mean and standard deviation of the validation-fold accuracy from a 10-fold cross-validation of classification by
an SVM, using as input the representations of graphs obtained from the 1-GNN and GIN networks, as well as from the
R-WL∗ variant of the relaxed WL kernel. MPNN layers (iia) refer to 5 message passing layers of length equal to the
average number of vertices among the graphs in the graphset on which the network is being trained, MPNN layers (iib)
refer to 5 message passing layers of length equal to the maximal number of vertices among the graphs in that graphset.
For each graphset, the representation yielding the highest accuracy is in bold.

  Representation         Activities of the last MPNN layer           Activities of all MPNN layers              Relaxed
                      MPNN layers (iia) MPNN layers (iib) MPNN layers (iia) MPNN layers (iib)                   WL
  Graphset            1-GNN      GIN         1-GNN GIN          1-GNN     GIN          1-GNN GIN                kernel
                                                 Accuracy [%]: mean ± standard deviation
  GS1                 96 ± 4     94 ± 4      98 ± 4 94 ± 4      96 ± 5    94 ± 3       98 ± 3 94 ± 5            90 ± 9
  GS2                 86 ± 8     82 ± 8      88 ± 7 83 ± 9      85 ± 7    86 ± 6       89 ± 5 84 ± 9            81 ± 9
  GS3                 82 ± 7     81 ± 3      84 ± 4 82 ± 4      84 ± 5    81 ± 4       86 ± 6 85 ± 6            82 ± 4
  GS4                 88 ± 6     81 ± 8      89 ± 6 82 ± 7      89 ± 5    83 ± 3       90 ± 8 82 ± 6            76 ± 7
  GS5                 81 ± 9     79 ± 8      84 ± 6 74 ± 8      80 ± 5    76 ± 7       84 ± 5 76 ± 8            66 ± 5
  GS6                 96 ± 5     85 ± 7      92 ± 5 84 ± 6      96 ± 4    84 ± 11 94 ± 3 84 ± 5                 83 ± 8
  GS7                 98 ± 3     93 ± 5      92 ± 3 91 ± 5      98 ± 3    92 ± 4       92 ± 5 92 ± 5            89 ± 5
  GS8                 83 ± 6     79 ± 6      79 ± 7 80 ± 7      83 ± 4    78 ± 6       79 ± 9 79 ± 6            65 ± 8
  GS9                 80 ± 6     75 ± 8      93 ± 6 75 ± 7      79 ± 10 76 ± 5         94 ± 5 73 ± 7            75 ± 7
  GS10                83 ± 8     81 ± 5      66 ± 8 76 ± 7      83 ± 6    78 ± 6       69 ± 6 77 ± 7            74 ± 7
  GS11                73 ± 11 79 ± 7         75 ± 4 71 ± 8      75 ± 6    78 ± 8       77 ± 7 71 ± 6            71 ± 8
  GS12                93 ± 4     80 ± 6      90 ± 5 80 ± 5      91 ± 4    84 ± 8       90 ± 7 79 ± 7            64 ± 6
  GS13                85 ± 4     75 ± 10 85 ± 4 78 ± 8          86 ± 7    71 ± 9       85 ± 8 74 ± 10           71 ± 8
  GS14                85 ± 7     73 ± 9      81 ± 5 72 ± 9      85 ± 6    76 ± 9       82 ± 3 71 ± 9            69 ± 7
  GS15                91 ± 6     66 ± 11 79 ± 7 71 ± 6          91 ± 4    67 ± 7       78 ± 7 67 ± 12           64 ± 6
  GS16                81 ± 8     86 ± 5      75 ± 6 80 ± 7      81 ± 7    85 ± 4       77 ± 7 79 ± 7            69 ± 6
  GS17                87 ± 6     78 ± 7      91 ± 6 74 ± 8      88 ± 6    78 ± 8       91 ± 5 72 ± 5            68 ± 10
  GS18                92 ± 4     78 ± 8      94 ± 4 75 ± 10 93 ± 3        75 ± 7       95 ± 3 75 ± 8            63 ± 8
  GS19                88 ± 7     72 ± 7      82 ± 6 72 ± 7      87 ± 5    75 ± 6       81 ± 8 73 ± 9            69 ± 6
  GS20                85 ± 4     73 ± 7      90 ± 5 71 ± 9      82 ± 6    75 ± 8       89 ± 9 70 ± 6            66 ± 7


analysis, we employed the Wilcoxon signed rank test with        ther those originating from a 1-GNN, nor those originating
two-sided alternative for all 36 pairs of the investigated      from a GIN. On the other hand, if one of the representa-
representations, because of the inconsistence of the more       tions originates from a 1-GNN and the other from a GIN,
commonly used mean ranks post-hoc test, to which re-            then the difference between the accuracies is significant,
cently Benavoli et al. pointed out [2]. For correction to       and similarly if one of them is GNN-based and the other
multiple hypotheses testing, we used the Holm method [7].       originates from the relaxed WL kernel. Combined with
   The results of the pairwise comparisons of the 9 inves-      the results in Table 2, this means that any of the four rep-
tigated representations and of their significance testing are   resentations originating from the 1-GNN yields most fre-
presented in Table 3. A number na,b in a row of the table       quently the highest accuracy, whereas the differences be-
corresponding to a representation a and a column corre-         tween those four representations are not significant. The
sponding to a representation b states in how many among         fact that the accuracies of the four representations origi-
the graphsets GS1–GS20, the representation a lead to a          nating from the same kind of GNNs were neither for 1-
higher mean classification accuracy than the representa-        GNN nor for GIN significantly different, suggests to test
tion b. If na,b > nb,a and the difference between the rep-      the stronger hypothesis that the mean classification accu-
resentations a and b is according to the Wilcoxon signed        racy of all those four representations coincides. For 1-
rank test significant at the familywise level 5%, after the     GNN, the Friedman test indeed did not reject that hypoth-
Holm correction, then na,b is in bold.                          esis, with quite high achieved significance p = 0.19. For
   The boldfaced significant differences in Table 3 reveal      GIN, it rejected the hypothesis on the usual significance
that representations originating from the same kind of          level 5%, but even on the significance level 4% it did not
GNNs never lead to significantly different accuracy, nei-       reject it any more (achieved significance p = 0.043).
Table 3: Comparison of the investigated representations. A number na,b in a row of the table corresponding to a rep-
resentation a and a column corresponding to a representation b states in how many among the graphsets GS1–GS20,
the representation a lead to a higher mean classification accuracy than the representation b. That number is in bold if
na,b > nb,a and the difference between the representations a and b is according to the Wilcoxon singed rank test significant
at the familywise level 5%, after the Holm correction. MPNN layers (iia) refer to 5 MPNN layers of length equal to the
average number of vertices among the graphs in the graphset on which the network is being trained, MPNN layers (iib)
refer to 5 MPNN layers of length equal to the maximal number of vertices among the graphs in that graphset.

                                                                            Activities of the last MPNN layer        Activities of all MPNN layers       Relaxed
                                          Representation                 MPNN layers (iia) MPNN layers (iib)     MPNN layers (iia) MPNN layers (iib)       WL
                                                                         1-GNN       GIN        1-GNN      GIN   1-GNN      GIN        1-GNN     GIN      kernel
                                             MPNN layers (iia)
      Activities of the last MPNN layer




                                                                 1-GNN       F         18         9        19         6        17         9         19        19



                                                                  GIN         2        F          4        10         2         9         4         13        17
                                             MPNN layers (iib)




                                                                 1-GNN       10        15        F         17         7        16         3         15        19



                                                                  GIN         0         6         3        F          0         4         3          9        17
                                             MPNN layers (iia)




                                                                 1-GNN        6        18        10        20        F         17        10         19        20
      Activities of all MPNN layers




                                                                  GIN         2         8         3        13         3        F          3         11        18
                                             MPNN layers (iib)




                                                                 1-GNN       10        15        10        17        10        16        F          16        19



                                                                  GIN         1         5         3         6         1         3         2         F         18



      Relaxed WL kernel                                                       0         1         1         0         0         1         1          1        F




4    Conclusion and Further Research                                                                        always lead to significantly different accuracies, and also
                                                                                                            the accuracies of a GNN-based representation and of the
This work-in-progress paper experimentally compared 8                                                       representation originating from the relaxed WL kernel dif-
variants of graph representations based on two recently                                                     fer significantly. On the other hand, the four accuracies of
proposed graph neural networks inspired by the WL iso-                                                      representations originating from the same kind of GNNs
morphism test, as well as a representation based on a re-                                                   are never significantly different. Moreover, the data even
cent generalization of the WL subtree kernel. They were                                                     don’t contradict a stronger hypothesis that the mean clas-
compared not with respect to their distance in the embe-                                                    sification accuracy of all those representations coincides.
ding space, but with respect to downstream classification                                                      The result that the representation based on the relaxed
by means of an SVM. The results of that comparison in-                                                      WL kernel was inferior to the GNN-based representations
dicate that the highest classification accuracy is achieved                                                 is explainable by the fact that the kernel was designed for
with representations originating from 1-GNN networks.                                                       dense and structurally more diverse graphs, whereas the
At the same time, the comparison revealed that any two                                                      reported investigation was performed on simple molecular
representations originating from different kinds of GNNs                                                    graphs.
   However, the paper presents really only first results that,      [3] Z. Chen, S. Villar, L. Chen, and J. Bruna. On the equiva-
in our opinion, indicate usefulness of a possible further               lence between graph isomorphism testing and function ap-
research in this direction, which we consider interesting               proximation with GNNs. In 33rd Conference on Neural
due to the crucial role nowadays played by representation               Information Processing Systems, pages 1–19, 2019.
learning. Needless to say, such a research would have to be         [4] G. Cybenko. Approximation by superpositions of a sign-
more comprehensive with respect to the employed graph                   moidal function. Mathematics of Control, Signals, and Sys-
data, as well as with respect to the involved representation            tems, 2:303–314, 1989.
methods.                                                            [5] H. Dell, M. Grohe, and G. Rattan. Lovász meets weisfeiler
   As to the employed data, we used 20 comparatively                    and leman. In 45th International Colloquium on Automata,
                                                                        Languages, and Programming, page article no. 40, 2018.
small disjoint graphsets from only four different bench-
mark datasets, to be able to assess the statistical signifi-        [6] F. Errica, D. Bacciu, and A. Micheli. Theoretically expres-
                                                                        sive and edge-aware graph learning. In European Sympo-
cance of the differences between the compared represen-
                                                                        sium on Artificial Neural Networks, Computational Intelli-
tations. To use instead a similar number of separate bench-             gence and Machine Learning, pages 175–180, 2020.
marks, would lead to much larger graphsets and would al-
                                                                    [7] S. Garcia and F. Herrera. An extension on "Statistical Com-
low to cover a broader spectrum of applications, and to                 parisons of Classifiers over Multiple Data Sets" for all pair-
drop the assumption that for the employed data, disjoint-               wise comparisons. Journal of Machine Learning Research,
ness is a sufficient condition for statistical independence.            9:2677–2694, 2008.
The benchmarks for future investigation should include              [8] GenWL. A Generalized Weisfeiler-Lehman Graph Kernel,
also dense and structurally diverse graphs, i.e. the kind               2021. https://github.com/mlai-bonn/GenWL.
of graphs for which the relaxed WL kernel is intended. In           [9] A. Grover and J. Leskovec. Node2vec: Scalable feature
addition, it would be interesting to know how the GNN-                  learning for networks. In ACM SIGKDD International
based representations change if instead of the colouring                Conference on Knowledge Discovery and Data Mining,
components (λ1 , . . . , λdc ) of a labelling λ , the complete          pages 855–86, 2016.
labeling is used.                                                  [10] W.L. Hamilton. Graph Repesentation Learning. Morgan &
   As to the involved representation methods, it would be               Claypool Publishers, San Rafael, 2020.
interesting to compare the relaxed WL kernel with meth-            [11] N.T. Hoang and T. Maehara. Graph homomorphism con-
ods based on other GNNs inspired by the WL test. Indeed,                volution. In 37th International Conference on Machine
other variants of the WL subtree kernel have been suffi-                Learning, pages 1–15, 2020.
ciently compared with the relaxed WL kernel in [23] and            [12] K. Hornik. Approximation capabilities of multilayer neural
shown to be inferior to it, and several of them have also al-           networks. Neural Networks, 4:251–257, 1991.
ready been compared with GNN-based methods inspired                [13] K. Hornik, M. Stichcombe, and H. White. Multilayer
by the WL test [1, 11,15]. On the other hand, comparisons               feeforward networks are univesal approximators. Neural
among different methods are sporadic [3, 11, 15], and a                 Networks, 2:359–366, 1989.
comparison of some of them with the relaxed WL kernel              [14] V. Kůrková. Kolmogorov’s theorem and multilayer neural
is, to the best of our knowledge, for the first time reported           networks. Neural Networks, 5:501–506, 1992.
in this paper.                                                     [15] H. Maron, H. Ben-Hamu, H. Serviansky, and Y. Lipman.
                                                                        Provably powerful graph networks. In 33rd Conference on
                                                                        Neural Information Processing Systems, pages 1–12, 2019.
Acknowledgement
                                                                   [16] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient
The authors are deeply indebted to Till Hendrik Schulz for              estimation of word representations in vector space. arXiv
                                                                        preprint 1301.3781, 2013.
repeated discussions and for his help with the relaxed WL
kernel and with its implementation GenWL. The research             [17] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean.
                                                                        Distributed representations of words and phrases and their
reported in this paper has been supported by the Charles
                                                                        compositionality. In 27th Conference on Neural Informa-
University SVV project 260575 and partially supported by
                                                                        tion Processing Systems, pages 3111–3119, 2013.
the Czech Science Foundation (GAČR) grant 18-18080S.
                                                                   [18] C. Morris, M. Ritzert, M. Fey, W.L. Hamilton, J.E. Lens-
                                                                        sen, et al. Weisfeiler and Leman go neural: Higher-order
References                                                              graph neural networks. In 33rd AAAI Conference on Artifi-
                                                                        cial Intelligence, pages 4602–4609, 2019.
 [1] D. Al-Rfou, R. adn Zelle and B. Perozzi. DDGK: Learning       [19] J. Park and I.W. Sandberg. Approximation and radial-basis-
     graph representations for deep divergence graph kernels. In        function networks. Neural Computation, 5:305–316, 1993.
     International Wolrd Wide Web Conference, pages 37–48,         [20] B. Perozzi, R. Al-Rfou, and S. Skiena. DeepWalk| online
     2019.                                                              learning of social representations. In ACM KDDM, pages
 [2] A. Benavoli, G. Corani, and F. Mangili. Should we really           701–710, 2014.
     use post-hoc tests based on mean-ranks? Journal of Ma-        [21] PyTorch Geometric Documentation, 2021. https://pytorch-
     chine Learning Research, 17:1–10, 2016.                            geometric.readthedocs.io/.
                                                                   [22] B. Schölkopf and A.J. Smola. Learning with Kernels. MIT
                                                                        Press, Cambridge, 2002.
[23] T.H. Schulz, T. Horváth, P. Welke, and S. Wrobel. A gen-       [26] N. Wale, I.A. Watson, and G. Karypis. Comparison of de-
     eralized weisfeiler-lehman graph kernel. ArXiv preprint             scriptor spaces for chemical compound retrieval and classi-
     arXiv:2101.08104.v1, 2021.                                          fication. Knowledge and Information Systems, 14:347–375,
[24] N. Shervarshidze, P. Schweitzer, E.J. van Leeuwen,                  2008.
     K. Mehlhorn, and K.M. Borgwardt. Weisfeiler-Lehman             [27] B. Weisfeiler and A.A. Lehman. A reduction of a graph to a
     graph kernels. Journal of Machine Learning Research,                canonical form and an algebra arising during this reduction.
     12:2539–2561, 2011.                                                 Nauchno-Technicheskaya Informatsia, 2:12–16, 1968.
[25] J.J. Sutherland, L.A. O’Brien, and D.F. Weaver. Spline-        [28] H. White. Artificial Neural Networks: Approximation and
     fitting with a genetic algorithm: A method for develop-             Learning Theory. Blackwell Publishers, Cambridge, 1992.
     ing classification structure-activity relationships. Journal   [29] K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful
     of Chemical Information and Computer Science, 43:1906–              are graph neural networks? In International Conference on
     1915, 2003.                                                         Learning Representations, pages 1–17, 2019.