=Paper= {{Paper |id=Vol-3201/paper7 |storemode=property |title=Exploring Representation of Horn clauses using GNNs |pdfUrl=https://ceur-ws.org/Vol-3201/paper7.pdf |volume=Vol-3201 |authors=Chencheng Liang,Philipp Rümmer,Marc Brockschmidt |dblpUrl=https://dblp.org/rec/conf/paar/LiangRB22 }} ==Exploring Representation of Horn clauses using GNNs== https://ceur-ws.org/Vol-3201/paper7.pdf
Exploring Representation of Horn Clauses using
GNNs
Chencheng Liang1 , Philipp Rümmer1,2 and Marc Brockschmidt3
1
  Uppsala University, Department of Information Technology, Uppsala, Sweden
2
  University of Regensburg, Regensburg, Germany
3
  Microsoft Research


                                        Abstract
                                        In recent years, the application of machine learning in program verification, and the embedding
                                        of programs to capture semantic information, has been recognised as an important tool by
                                        many research groups. Learning program semantics from raw source code is challenging due
                                        to the complexity of real-world programming language syntax and due to the difficulty of
                                        reconstructing long-distance relational information implicitly represented in programs using
                                        identifiers. Addressing the first point, we consider Constrained Horn Clauses (CHCs) as a
                                        standard representation of program verification problems, providing a simple and programming
                                        language-independent syntax. For the second challenge, we explore graph representations of
                                        CHCs, and propose a new Relational Hypergraph Neural Network (R-HyGNN) architecture to
                                        learn program features.
                                            We introduce two different graph representations of CHCs. One is called constraint
                                        graph (CG), and emphasizes syntactic information of CHCs by translating the symbols and
                                        their relations in CHCs as typed nodes and binary edges, respectively, and constructing the
                                        constraints as abstract syntax trees. The second one is called control- and data-flow hypergraph
                                        (CDHG), and emphasizes semantic information of CHCs by representing the control and data
                                        flow through ternary hyperedges.
                                            We then propose a new GNN architecture, R-HyGNN, extending Relational Graph Con-
                                        volutional Networks, to handle hypergraphs. To evaluate the ability of R-HyGNN to extract
                                        semantic information from programs, we use R-HyGNNs to train models on the two graph
                                        representations, and on five proxy tasks with increasing difficulty, using benchmarks from
                                        CHC-COMP 2021 as training data. The most difficult proxy task requires the model to predict
                                        the occurrence of clauses in counter-examples, which subsumes satisfiability of CHCs. CDHG
                                        achieves 90.59% accuracy in this task. Furthermore, R-HyGNN has perfect predictions on
                                        one of the graphs consisting of more than 290 clauses. Overall, our experiments indicate that
                                        R-HyGNN can capture intricate program features for guiding verification problems.

                                         Keywords
                                         Constraint Horn clauses, Graph Neural Networks, Automatic program verification




PAAR’22: 8th Workshop on Practical Aspects of Automated Reasoning, August 11–12, 2022, Haifa, Israel
email: chencheng.liang@it.uu.se (C. Liang); philipp.ruemmer@it.uu.se (P. Rümmer);
marc@marcbrockschmidt.de (M. Brockschmidt)
url: https://github.com/ChenchengLiang/ (C. Liang); https://github.com/pruemmer/ (P. Rümmer);
https://github.com/mmjb/ (M. Brockschmidt)
orcid: 0000-0002-4926-8089 (C. Liang); 0000-0002-2733-7098 (P. Rümmer); 0000-0001-6277-2768
(M. Brockschmidt)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution
                                       4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
1. Introduction
Automatic program verification is challenging because of the complexity of industrially
relevant programs. In practice, constructing domain-specific heuristics from program
features (e.g., information from loops, control flow, or data flow) is essential for solving
verification problems. For instance, [1] and [2] extract semantic information by per-
forming systematical static analysis to refine abstractions for the counterexample-guided
abstraction refinement (CEGAR) [3] based system. However, manually designed heuristics
usually aim at a specific domain and are hard to transfer to other problems. Along with
the rapid development of deep learning in recent years, learning-based methods have
evolved quickly and attracted more attention. For example, the program features are
explicitly given in [4, 5] to decide which algorithm is potentially the best for verifying
the programs. Later in [6, 7], program features are learned in the end-to-end pipeline.
Moreover, some generative models [8, 9] are also introduced to produce essential informa-
tion for solving verification problems. For instance, Code2inv [10] embeds the programs
by graph neural networks (GNNs) [11] and learns to construct loop invariants by a deep
neural reinforcement framework.
   For deep learning-based methods, no matter how the learning pipeline is designed
and the neural network structure is constructed, learning to represent semantic program
features is essential and challenging (a) because the syntax of the source code varies
depending on the programming languages, conventions, regulations, and even syntax sugar
and (b) because it requires capturing intricate semantics from long-distance relational
information based on re-occurring identifiers. For the first challenge, since the source code
is not the only way to represent a program, learning from other formats is a promising
direction. For example, inst2vec [12] learns control and data flow from LLVM intermediate
representation [13] by recursive neural networks (RNNs) [14]. Constrained Horn Clauses
(CHCs) [15], as an intermediate verification language, consist of logic implications and
constraints and can alleviate the difficulty since they can naturally encode program logic
with simple syntax. For the second challenge, we use graphs to represent CHCs and
learn the program features by GNNs since they can learn from the structural information
within the node’s N-hop neighbourhood by recursive neighbourhood aggregation (i.e.,
neural message passing) procedure.
   In this work, we explore how to learn program features from CHCs by answering two
questions: (1) What kind of graph representation is suitable for CHCs? (2) Which kind
of GNN is suitable to learn from the graph representation?
   For the first point, we introduce two graph representations for CHCs: the constraint
graph (CG) and control- and data-flow hypergraph (CDHG). The constraint graph encodes
the CHCs into three abstract layers (predicate, clause, and constraint layers) to preserve as
much structural information as possible (i.e., it emphasizes program syntax). On the other
hand, the Control- and data-flow hypergraph uses ternary hyperedges to capture the flow
of control and data in CHCs to emphasize program semantics. To better express control
and data flow in CDHG, we construct it from normalized CHCs. The normalization
changes the format of the original CHC but retains logical meaning. We assume that
different graph representations of CHCs capture different aspects of semantics. The two
Table 1
Proxy tasks used to evaluate suitability of different graph representations.
   Task                           Task type        Description
   1. Argument identification     Node binary      For each element in CHCs, predict if it is an
                                  classification   argument of relation symbols.
   2. Count occurrence of rela-   Regression       For each relation symbol, predict how many times
   tion symbols in all clauses    task on node     it occurs in all clauses.
   3. Relation symbol occur-      Node binary      For each relation symbol, predict if a cycle exists
   rence in SCCs                  classification   from the node to itself (membership in strongly
                                                   connected component, SCC).
   4. Existence of argument       Node binary      For each argument of a relation symbol, predict
   bounds                         classification   if it has a lower or upper bound.
   5. Clause occurrence in        Node binary      For each CHC, predict if it appears in counter-
   counter-examples               classification   examples.


graph representations can be used as a baseline to construct new graph representations of
CHC to represent different semantics. In addition, similar to the idea in [16], our graph
representations are invariant to the concrete symbol names in the CHCs since we map
them to typed nodes.
   For the second point, we propose a Relational Hypergraph Neural Network (R-HyGNN),
an extension of Relational Graph Convolutional Networks (R-GCN) [17]. Similar to the
GNNs used in LambdaNet [18], R-HyGNN can handle hypergraphs by concatenating
the node representations involved in a hyperedge and passing the messages to all nodes
connected by the hyperedge.
   Finally, we evaluate our framework (two graph representations of CHCs and R-HyGNN)
by five proxy tasks (see details in Table 1) with increasing difficulties. Task 1 requires the
framework to learn to classify syntactic information of CHCs, which is explicitly encoded
in the two graph representations. Task 2 requires the R-HyGNN to predict a syntactic
counting task. Task 3 needs the R-HyGNN to approximate the Tarjan’s algorithm [19],
which solves a general graph theoretical problem. Task 4 is much harder than the last
three tasks since the existence of argument bounds is undecidable. Task 5 is harder than
solving CHCs since it predicts the trace of counter-examples (CEs). Note that Task 1
to 3 can be easily solved by specific, dedicated standard algorithms. We include them
to systematically study the representational power of graph neural networks applied to
different graph construction methods. However, we speculate that using these tasks as
pre-training objectives for neural networks that are later fine-tuned to specific (data-poor)
tasks may be a successful strategy which we plan to study in future work.
   Our benchmark data is extracted from the 8705 linear and 8425 non-linear Linear
Integer Arithmetic (LIA) problems in the CHC-COMP repository1 (see Table 1 in the
competition report [20]). The verification problems come from various sources (e.g.,
higher-order program verification benchmark2 and benchmarks generated with JayHorn3 ),
   1
     https://chc-comp.github.io/
   2
     https://github.com/chc-comp/hopv
   3
     https://github.com/chc-comp/jayhorn-benchmarks
therefore cover programs with different size and complexity. We collect and form the train,
valid, and test data using the predicate abstraction-based model checker Eldarica [21].
We implement R-HyGNNs4 based on the framework tf2_gnn5 . Our code is available in
a Github repository6 . For both graph representations, even if the predicted accuracy
decreases along with the increasing difficulty of tasks, for undecidable problems in Task
4, R-HyGNN still achieves high accuracy, i.e., 91% and 94% for constraint graph and
CDHG, respectively. Moreover, in Task 5, despite the high accuracy (96%) achieved by
CDHG, R-HyGNN has a perfect prediction on one of the graphs consisting of more than
290 clauses, which is impossible to achieve by learning simple patterns (e.g., predict the
clause including false as positive). Overall, our experiments show that our framework
learns not only the explicit syntax but also intricate semantics.

Contributions of the paper. (i) We encode CHCs into two graph representations,
emphasising abstract program syntactic and semantic information, respectively. (ii) We
extend a message passing-based GNN, R-GCN, to R-HyGNN to handle hypergraphs.
(iii) We introduce five proxy supervised learning tasks to explore the capacity of R-
HyGNN to learn semantic information from the two graph representations. (iv) We
evaluate our framework on the CHC-COMP benchmark and show that this framework
can learn intricate semantic information from CHCs and has the potential to produce
good heuristics for program verification.


2. Background
2.1. From Program Verification to Horn clauses
Constrained Horn Clauses are logical implications involving unknown predicates. They
can be used to encode many formalisms, such as transition systems, concurrent systems,
and interactive systems. The connections between program logic and CHCs can be bridged
by Floyd-Hoare logic [22, 23], allowing to encode program verification problems into the
CHC satisfiability problems [24]. In this setting, a program is guaranteed to satisfy a
specification if the encoded CHCs are satisfiable, and vice versa.
   We write CHCs in the form H ← B1 ∧ · · · ∧ Bn ∧ ϕ, where (i) Bi is an application qi (t¯i )
of the relation symbol qi to a list of first-order terms t¯i ; (ii) H is either an application
q(t̄), or false; (iii) ϕ is a first-order constraint. Here, H and B1 ∧ · · · ∧ Bn ∧ ϕ in the left
and right hand side of implication ← are called “head" and “body", respectively.
   An example in Figure 1 explains how to encode a verification problem into CHCs. In
Figure 1a, we have a verification problem, i.e., a C program with specifications. It has an
external input n, and we can initially assume that x == n, y == n, and, n >= 0. While
x is not equal to 0, x and y are decreased by 1. The assertion is that finally, y == 0.
This program can be encoded to the CHC shown in Figure 1b. The variables x and y

   4
     https://github.com/ChenchengLiang/tf2-gnn
   5
     https://github.com/microsoft/tf2-gnn
   6
     https://github.com/ChenchengLiang/Systematic-Predicate-Abstraction-using-Machine-Learning
                                                              L0 (n) ← true                           line 0
                                                              L1 (n) ← L0 (n)                         line 1
0     extern int n ;                                     L2 (x, y, n) ← L1 (n)                        line 2
1     v o i d main ( ) {
2            int x , y ;
                                                         L3 (x, y, n) ← L2 (x, y, n) ∧ n ≥ 0
3            assume ( x==n && y==n && n>=0) ;                            ∧x=n∧y =n                    line 3
4            w h i l e ( x !=0) {                        L8 (x, y, n) ← L3 (x, y, n) ∧ x = 0          line 4
5                    x−−;
6                    y−−;                                L4 (x, y, n) ← L3 (x, y, n) ∧ x 6= 0         line 4
7            }                                           L5 (x, y, n) ← L4 (x0 , y, n) ∧ x = x0 − 1   line 5
8            a s s e r t ( y==0) ;                                               0              0
                                                         L6 (x, y, n) ← L5 (x, y , n) ∧ y = y − 1     line 6
9     }
                                                         L3 (x, y, n) ← L6 (x, y, n)                  line 6
       (a) An verification problem written in C.               false ← L8 (x, y, n) ∧ y 6= 0          line 8

                                                        (b) CHCs encoded from C program in Figure 1a.
           L(x, y, n) ← n ≥ 0 ∧ x = n ∧ y = n                                             line 3
           L(x, y, n) ← L(x0 , y 0 , n0 ) ∧ x0 6= 0 ∧ x = x0 − 1 ∧ y = y 0 − 1 ∧ n = n0   line 4-7
                false ← L(x, y, n) ∧ x = 0 ∧ y 6= 0                                       line 8

                                    (c) Simplified CHCs from Figure 1b.
    Figure 1: An example to show how to encode a verification problem written in C to CHCs. For the C
    program, the left-hand side numbers indicate the line number. The line numbers in Figure 1b and 1c
    correspond to the line in Figure 1a. For example, the line L0 (n) ← true in Figure 1b is transformed
    from line 1 “extern int n ;" in Figure 1a.


    are quantified universally. We can further simplify the CHCs in Figure 1b to the CHCs
    shown in Figure 1c without changing the satisfiability by some preprocessing steps (e.g.,
    inlining and slicing) [25]. For example, the first CHC encodes line 3, i.e., the assume
    statement, the second clause encodes lines 4-7, i.e., the while loop, and the third clause
    encodes line 8, i.e., the assert statement in Figure 1a. Solving the CHCs is equivalent
    to answering the verification problem. In this example, with a given n, if the CHCs are
    satisfiable for all x and y, then the program is guaranteed to satisfy the specifications.

    2.2. Graph Neural Networks
    Let G = (V, R, E, X, `) denote a graph in which v ∈ V is a set of nodes, r ∈ R is a set of
    edge types, E ∈ V × V × R is a set of typed edges, x ∈ X is a set of node types, and
    ` : v → x is a labelling map from nodes to their type. A tuple e = (u, v, r) ∈ E denotes
    an edge from node u to v with edge type r.
       Message passing-based GNNs use a neighbourhood aggregation strategy, where at
    timestep t, each node updates its representation htv by aggregating representations of its
    neighbours and then combining its own representation. The initial node representation h0v
    is usually derived from its type or label `(v). The common assumption of this architecture
    is that after T iterations, the node representation hTv captures local information within
t-hop neighbourhoods. Most GNN architectures [26, 27] can be characterized by their
used “aggregation” function ρ and “update” function φ. The node representation of the
t-th layer of such a GNN is then computed by htv = φ(ρ({ht−1
                                                           u   | u ∈ Nvr , r ∈ R}), ht−1
                                                                                     v ),
                                    r
where R is a set of edge type and Nv is the set of nodes that are the neighbors of v in
edge type r.
   A closed GNN architecture to the R-HyGNN is R-GCN [17]. They update the node
representation by
                                 X X 1
                        htv = σ(             Wrt ht−1     t−1
                                                  u + W0 hv ),                         (1)
                                      r
                                        cv,r
                                   r∈R u∈Nv

where Wr and W0 are edge-type-dependent trainable weights, cv,r is a learnable or fixed
normalisation factor, and σ is a activation function.


3. Graph Representations for CHCs
Graphs as a representation format support arbitrary relational structure and thus can
naturally encode information with rich structures like CHCs. We define two graph
representations for CHCs that emphasize the program syntax and semantics, respectively.
We map all symbols in CHCs to typed nodes and use typed edges to represent their
relations. In this section, we give concrete examples to illustrate how to construct the
two graph representations from a single CHC modified from Figure 1c. In the examples,
we first give the intuition of the graph design and then describe how to construct the
graph step-wise. To better visualize how to construct the two graph representations in
Figures 2 and 3, the concrete symbol names for the typed nodes are shown in the blue
boxes. R-HyGNN is not using these names (which, as a result of repeated transformations,
usually do not carry any meaning anyway) and only consumes the node types. The
formal definitions of the graph representations and the algorithms to construct them
from multiple CHCs are in the full version of this paper [28]. Note that the two graph
representations in this study are designed empirically. They can be used as a baseline to
create variations of the graphs to fit different purposes.

3.1. Constraint Graph (CG)
Our Constraint graph is a directed graph with binary edges designed to emphasize
syntactic information in CHCs. One concrete example of constructing the constraint
graph for a single CHC L(x, y, n) ← L(x0 , y 0 , n0 ) ∧ x 6= 0 ∧ x = x0 − 1 ∧ y = y 0 − 1 modified
from Figure 1c is shown in Figure 2. The corresponding node and edge types are described
in Tables 2 and 3 in the full version of this paper [28].
   We construct the constraint graph by parsing the CHCs in three different aspects
(relation symbol, clause structure, and constraint) and building relations for them. In
other words, a constraint graph consists of three layers: the predicate layer depicts the
relation between relation symbols and their arguments; the clause layer describes the
abstract syntax of head and body items in the CHC; the constraint layer represents the
constraint by abstract syntax trees (ASTs).
Figure 2: Construct constraint graph from the CHC L(x, y, n) ← L(x0 , y 0 , n0 ) ∧ x 6= 0 ∧ x =
x0 − 1 ∧ y = y 0 − 1. Note that some nodes have multiple concrete symbol names (e.g., node rsa1 has
two concrete names, x and x0 ) since one relation symbol may bind with different arguments.


Constructing a constraint graph. Now we give a concrete example to describe how to
construct a constraint graph for a single CHC L(x, y, n) ← L(x0 , y 0 , n0 ) ∧ x 6= 0 ∧ x =
x0 − 1 ∧ y = y 0 − 1 step-wise. All steps correspond to the steps in Figure 2. In the
first step, we draw relation symbols and their arguments as typed nodes and build the
connection between them. In the second step, we construct the clause layer by drawing
clauses, the relation symbols in the head and body, and their arguments as typed nodes
and build the relation between them. In the third step, we construct the constraint
layer by drawing ASTs for the sub-expressions of the constraint. In the fourth step, we
add connections between three layers. The predicate and clause layer are connected by
the relation symbol instance (RSI) and argument instance (AI) edges, which means the
elements in the predicate layer are instances of the clause layer. The clause and constraint
layers are connected by the GUARD and DATA edges since the constraint is the guard
of the clause implication, and the constraint and clause layer share some elements.

3.2. Control- and Data-flow Hypergraph (CDHG)
In contrast to the constraint graph, the CDHG representation emphasizes the semantic
information (control and data flow) in the CHCs by hyperedges which can join any number
of vertices. To represent control and data flow in CDHG, first, we preprocess every CHC
and then split the constraint into control and data flow sub-formulas.

Normalization. We normalize every CHC by applying the following rewriting steps:
(i) We ensure that every relation symbol occurs at most once in every clause. For instance,
the CHC q(a) ← q(b) ∧ q(c) has multiple occurrences of the relation symbol q, and we
Table 2
Control- and data-flow sub-formula in constraints for the normalized CHCs from Figure 1c
   Normalized CHCs                               Control-flow sub-formula   Data-flow sub-formula
   L(x, y, n) ← n ≥ 0 ∧ x = n ∧ y = n            n≥0                        x = n, y = n
   L(x, y, n) ← L0 (x0 , y 0 , n0 ) ∧ x 6= 0 ∧   x 6= 0                     x = x0 − 1, y = y 0 − 1
   x = x0 − 1 ∧ y = y 0 − 1
   L0 (x0 , y 0 , n0 ) ← L(x, y, n)∧x0 = x∧      empty                      x0 = x, y 0 = y, n0 = n
   y 0 = y ∧ n0 = n
   false ← L(x, y, n) ∧ x = 0 ∧ y 6= 0           y 6= 0                     x=0


normalize it to equi-satisfiable CHCs q(a) ← q 0 (b) ∧ q 00 (c), q 0 (b) ← q(b0 ) ∧ b = b0 and
q 00 (c) ← q(c0 ) ∧ c = c0 . (ii) We associate each relation symbol q with a unique vector
of pair-wise distinct argument variables x̄q , and rewrite every occurrence of q to the
form q(x̄q ). In addition, all the argument vectors x̄q are kept disjoint. The normalized
CHCs from Figure 1c are shown in Table 2.

Splitting constraints into control- and data-flow formulas. We can rewrite the con-
straint ϕ to a conjunction ϕ = ϕ1 ∧ · · · ∧ ϕk , k ∈ N . The sub-formula ϕi is called a
“data-flow sub-formula” if and only if it can be rewritten to the form x = t(ȳ) such that
(i) x is one of the arguments in head q(x̄q ); (ii) t(ȳ) is a term over variables ȳ, where each
element of ȳ is an argument of some body literal q 0 (x̄q0 ). We call all other ϕj “control-flow
sub-formulas”. A constraint ϕ can then be represented by ϕ = g1 ∧ · · · ∧ gm ∧ d1 ∧ · · · ∧ dn ,
where m, n ∈ N and gi and dj are the control- and data-flow sub-formulas, respectively.
The control and data flow sub-formulas for the normalized CHCs of our running example
are shown in Table 2.

Constructing a CDHG. The CDHG represents program control- and data-flow by
guarded control-flow hyperedges CFHE and data-flow hyperedges DFHE. A CFHE edge
denotes the flow of control from the body to head literals of the CHC. A DFHE edge
denotes how data flows from the body to the head. Both control- and data-flow are
guarded by the control flow sub-formula.
    Constructing the CDHG for a normalized CHC L(x, y, n) ← L0 (x0 , y 0 , n0 ) ∧ x 6= 0 ∧ x =
x − 1 ∧ y = y 0 − 1 is shown in Figure 3. The corresponding node and edge types are
  0

described in Tables 5 and 6 in the full version of this paper [28].
    In the first step, we draw relation symbols and their arguments and build the relation
between them. In the second step, we add a guard node and draw ASTs for the control flow
sub-formulas. In the third step, we construct guarded control-flow edges by connecting
the relation symbols in the head and body and the guard node, which connects the root
of control flow sub-formulas. In the fourth step, we construct the ASTs for the right-hand
side of every data flow sub-formula. In the fifth step, we construct the guarded data-flow
edges by connecting the left- and right-hand sides of the data flow sub-formulas and the
guard node. Note that the diamond shapes in Figure 3 are not nodes in the graph but
are used to visualize our (ternary) hyperedges of types CFHE and DFHE. We show it in
Figure 3: Construct the CDHG from the CHC L(x, y, n) ← L0 (x0 , y 0 , n0 ) ∧ x 6= 0 ∧ x = x0 − 1 ∧ y =
y 0 − 1.


this way to visualize CDHG better.


4. Relational Hypergraph Neural Network
Different from regular graphs, which connect nodes by binary edges, CDHG includes
hyperedges which connect arbitrary number of nodes. Therefore, we extend R-GCN to
R-HyGNN to handle hypergraphs. The updating rule for node representation at time
step t in R-HyGNN is
                               XX X
                   htv = ReLU(                t
                                            Wr,p · k[ht−1
                                                      u   | u ∈ e]),          (2)
                                     r∈R p∈Pr e∈Evr,p

where k{·} denotes concatenation of all elements in a set, r ∈ R = {ri | i ∈ N} is the
set of edge types (relations), p ∈ Pr = {pj | j ∈ N} is the set of node positions under
                t denotes learnable parameters when the node is in the pth position
edge type r, Wr,p
with edge type r, and e ∈ Evr,p is the set of hyperedges of type r in the graph in which
node v appears in position p, where e is a list of nodes. The updating process for the
representation of node v at time step 1 is illustrated in Figure 4.
  Note that different edge types may have the same number of connected nodes. For
instance, in CDHG, CFHE and DFHE are both ternary edges.
Figure 4: An example to illustrate how to update node representation for R-HyGNN using (2). At
the right-hand side, there are three types of edges connected with node 1. We compute the updated
representation h11 for node 1 at the time step 1. k means concatenation. xi is the initial feature
vector of node i. The red blocks are the trace of the updating for node 1. The edge type 1 is a unary
edge and is a self-loop. It has one set of learnable parameters as the update function i.e., Wr0,p1 .
The edge type 2 is binary edge, it has two update functions i.e., Wr1,p1 and Wr1,p2 . Node 1 is in
the first position in edge [1,2], [1,3], [1,4], and [1,5], so the concatenated node representation will
be updated by Wr1,p1 . On the other hand, for the other two edges [6,1] and [7,1], node 1 is in the
second position, so the concatenated node representation will be updated by Wr1,p2 . For edge type
3, the same rule applies, i.e., depending on node 1’s position in the edge, the concatenated node
representation will go through different parameter sets. Since there is no edge that node 1 is in the
second position, we use a dashed box and arrow to represent it. The aggregation is to add all updated
representations from different edge types.


  Overall, our definition of R-HyGNN is a generalization of R-GCN. Concretely, it can
directly be applied to the special-case of binary graphs, and in that setting is slightly
more powerful as each message between nodes is computed using the representations of
both source and target nodes, whereas R-GCN only uses the source node representation.
4.1. Training Model
The end-to-end model consists of three components: the first component is an embedding
layer, which learns to map the node’s type (encoded by integers) to the initial feature
vectors; the second component is R-HyGNN, which learns program features; the third
component is a set of fully connected neural networks, which learns to solve a specific
task by using gathered node representations from R-HyGNNs. All parameters in these
three components are trained together. Note that different tasks may gather different
node representations. For example, Task 1 gathers all node representations, while Task 2
only gathers node representations with type rs.
   We set all vector lengths to 64, i.e., the embedding layer output size, the middle layers’
neuron size in R-HyGNNs, and the layer sizes in fully connected neural networks are
all 64. The maximum training epoch is 500, and the patient is 100. The number of
message passing steps is 8 (i.e., (2) is applied eight times). For the rest of the parameters
(e.g., learning rate, optimizer, dropout rate, etc.), we use the default setting in the
tf2_gnn framework. We set these parameters empirically according to the graph size
and the structure. We apply these fixed parameter settings for all tasks and two graph
representations without fine-tuning.


5. Proxy Tasks
We propose five proxy tasks with increasing difficulty to systematically evaluate the
R-HyGNN on the two graph representations. Tasks 1 to 3 evaluate if R-HyGNN can
solve general problems in graphs. In contrast, Tasks 4 and 5 evaluate if combining our
graph representations and R-HyGNN can learn program features to solve the encoded
program verification problems. We first describe the learning target for every task and
then explain how to produce training labels and discuss the learning difficulty.

Task 1: Argument identification. For both graph representations, the R-HyGNN model
performs binary classification on all nodes to predict if the node type is a relation symbol
argument (rsa) and the metric is accuracy. The binary training label is obtained by
reading explicit node types. This task evaluates if R-HyGNN can differentiate explicit node
types. This task is easy because the graph explicitly includes the node type information
in both typed nodes and edges.

Task 2: Count occurrence of relation symbols in all clauses. For both graph repre-
sentations, the R-HyGNN model performs regression on nodes with type rs to predict
how many times the relation symbols occur in all clauses. The metric is mean square
error. The training label is obtained by counting the occurrence of every relation symbol
in all clauses. This task is designed to see if R-HyGNN can correctly perform a counting
task. For example, the relation symbol L occurs four times in all CHCs in Figure 2, so
the training label for node L is value 4. This task is harder than Task 1 since it needs to
count the connected binary edges or hyperedges for a particular node.
Task 3: Relation symbol occurrence in SCCs. For both graph representations, the
R-HyGNN model performs binary classification on nodes with type rs to predict if this
node is an SCC (i.e., in a cycle) and the metric is accuracy. The binary training label is
obtained using Tarjan’s algorithm [19]. For example, in Figure 2, L is an SCC because
L and L0 construct a cycle by L ← L0 and L0 ← L. This task is designed to evaluate
if R-HyGNN can recognize general graph structures such as cycles. This task requires
the model to classify a graph-theoretic object (SCC), which is harder than the previous
two tasks since it needs to approximate a concrete algorithm rather than classifying or
counting explicit graphical elements.

Task 4: Existence of argument bounds. For both graph representations, we train
two independent R-HyGNN models which perform binary classification on nodes with
type rsa to predict if individual arguments have (a) lower and (b) upper bounds in the
least solution of a set of CHCs, and the metric is accuracy. To obtain the training label,
we apply the Horn solver Eldarica to check the correctness of guessed (and successively
increased) lower and upper arguments bounds; arguments for which no bounds can be
shown are assumed to be unbounded. We use a timeout of 3 s for the lower and upper
bound of a single argument, respectively. The overall timeout for extracting labels from
one program is 3 hours. For example, consider the CHCs in Fig. 1c. The CHCs contain a
single relation symbol L; all three arguments of L are bounded from below but not from
above. This task is (significantly) harder than the previous three tasks, as boundedness
of arguments is an undecidable property that can, in practice, be approximated using
static analysis methods.

Task 5: Clause occurrence in counter-examples This task consists of two binary
classification tasks on nodes with type guard (for CDHG), and with type clause (for
constraint graph) to predict if a clause occurs in the counter-examples. Those kinds
of nodes are unique representatives of the individual clauses of a problem. The task
focuses on unsatisfiable sets of CHCs. Every unsatisfiable clause set gives rise to a set of
minimal unsatisfiable subsets (MUSes); MUSes correspond to the minimal CEs of the
clause set. Two models are trained independently to predict whether a clause belongs to
(a) the intersection or (b) the union of the MUSes of a clause set. The metric for this
task is accuracy. We obtain training data by applying the Horn solver Eldarica [25], in
combination with an optimization library that provides an algorithm to compute MUSes7 .
This task is hard, as it attempts the prediction of an uncomputable binary labelling of
the graph.


6. Evaluation
We first describe the dataset we use for the training and evaluation and then analyse the
experiment results for the five proxy tasks.

   7
       https://github.com/uuverifiers/lattice-optimiser/
Table 3
The number of labeled graph representations extracted from a collection of CHC-COMP benchmark
[20]. For each SMT-LIB file, the graph representations for Task 1,2,3,4 are extracted together using
the timeout of 3 hours, and for task 5 is extracted using 20 minutes timeout. Here, T. denotes Task.
                     SMT-LIB files        Constraint graphs                    CDHGs
                     Total Unsat     T. 1-4 T. 5 (a) T. 5 (b)       T. 1-4    T. 5 (a)   T. 5 (b)
  Linear LIA         8705  1659      2337    881        857         3029      880        883
  Non-linear LIA     8425  3601      3376    1141       1138        4343      1497       1500
  Aligned            17130 5260      5602    1927       1914        5602      1927       1914


6.1. Benchmarks and Dataset
Table 3 shows the number of labelled graph representations from a collection of CHC-
COMP benchmarks [20]. All graphs were constructed by first running the preprocessor
of Eldarica [25] on the clauses, then building the graphs as described in Section 3,
and computing training data. For instance, in the first four tasks we constructed 2337
constraint graphs with labels from 8705 benchmarks in the CHC-COMP LIA-Lin track
(linear Horn clauses over linear integer arithmetic). The remaining 6368 benchmarks are
not included in the learning dataset because when we construct the graphs, (1) the data
generation process timed out, or (2) the graphs were too big (more than 10,000 nodes), or
(3) there was no clause left after simplification. In Task 5, since the label is mined from
CEs, we first need to identify unsat benchmarks using a Horn solver (1-hour timeout),
and then construct graph representations. We obtain 881 and 857 constraint graphs when
we form the labels for Task 5 (a) and (b), respectively, in LIA-Lin.
   Finally, to compare the performance of the two graph representations, we align the
dataset for both two graph representations to have 5602 labelled graphs for the first four
tasks. For Task 5 (a) and (b), we have 1927 and 1914 labelled graphs, respectively. We
divide them to train, valid, and test sets with ratios of 60%, 20%, and 20%. We include
all corresponding files for the dataset in a Github repository 8 .

6.2. Experimental Results for Five Proxy Tasks
From Table 4, we can see that for all binary classification tasks, the accuracy for both
graph representations is higher than the ratios of the dominant labels. For the regression
task, the scattered points are close to the diagonal line. These results show that R-HyGNN
can learn the syntactic and semantic information for the tasks rather than performing
simple strategies (e.g., fill all likelihood by 0 or 1). Next, we analyse the experimental
results for every task.

Task 1: Argument identification. When the task is performed in the constraint graph,
the accuracy of prediction is 100%, which means R-HyGNN can perfectly differentiate if a
node is a relation symbol argument rsa node. When the task is performed in CDHG, the
accuracy is close to 100% because, unlike in the constraint graph, the number of incoming
   8
       https://github.com/ChenchengLiang/Horn-graph-dataset
and outgoing edges are fixed (i.e., RSA and AI), the rsa nodes in CDHG may connect
with a various number of edges (including RSA, AST _1, AST _2, and DFHE) which
makes R-HyGNN hard to predict the label precisely.
   Besides, the data distribution looks very different between the two graph representations
because the normalization of CHCs introduces new clauses and arguments. For example,
in the simplified CHCs in Figure 1c, there are three arguments for the relation symbol
L, while in the normalized clauses in Figure 2, there are six arguments for two relation
symbols L and L0 . If the relation symbols have a large number of arguments, the difference
in data distribution between the two graph representations becomes larger. Even though
the predicted label in this task cannot directly help solve the CHC-encoded problem, it is
important to study the message flows in the R-HyGNNs.

Task 2: Count occurrence of relation symbols in all clauses. In the scattered plots in
Figure 5, the x- and y-axis denote true and the predicted values in the logarithm scale,
respectively. The closer scattered points are to the diagonal line, the better performance
of predicting the number of relation symbol occurrences in CHCs. Both CDHG and
constraint graph show good performance (i.e., most of the scattered points are near the
diagonal lines). This syntactic information can be obtained by counting the CFHE and
RSI edges for CDHG and constraint graph, respectively. When the number of nodes is
large, the predicted values are less accurate. We believe this is because graphs with a
large number of nodes have a more complex structure, and there is less training data.
Moreover, the mean square error for the CDHG is larger than the constraint graph
because normalization increases the number of nodes and the maximum counting of
relation symbols for CDHG, and the larger range of the value is, the more difficult for
regression task. Notice that the number of test data (1115) for this task is less than the
data in the test set (1121) shown in Table 3 because the remaining six graphs do not
have a rs node.

Task 3: Relation symbol occurrence in SCCs. The predicted high accuracy for both
graph representations shows that our framework can approximate Tarjan’s algorithm [19].
In contrast to Task 2, even if the CDHG has more nodes than the constraint graph on
average, the CDHG has better performance than the constraint graph , which means the
control and data flow in CDHG can help R-HyGNN to learn graph structures better. For
the same reason as task 2, the number of test data (1115) for this task is less than the
data in the test set (1121).

Task 4: Existence of argument bounds. For both graph representations, the accuracy
is much higher than the ratio of the dominant label. Our framework can predict the
answer for undecidable problems with high accuracy, which shows the potential for guiding
CHC solvers. The CDHG has better performance than the constraint graph, which might
be because predicting argument bounds relies on semantic information. The number of
test data (1028) for this task is less than the data in the test set (1121) because the
remaining 93 graphs do not have a rsa node.
(a) Scatter plot for CDHG. The total rs node        (b) Scatter plot for constraint graph. The total
    number is 16858. The mean square error is           rs node number is 11131. The mean square
    4.22.                                               error is 1.04.
Figure 5: Scatter plot for Task 2. The x- and y-axis are in logarithmic scales.


Task 5: Clause occurrence in counter-examples. For Task (a) and (b), the overall
accuracy for two graph representations is high. We manually analysed some predicted
results by visualizing the (small) graphs9 . We identify some simple patterns that are
learned by R-HyGNNs. For instance, the predicted likelihoods are always high for the rs
nodes connected to the false nodes. One promising result is that the model can predict all
labels perfectly for some big graphs10 that contain more than 290 clauses, which confirms
that the R-HyGNN is learning certain intricate patterns rather than simple patterns. In
addition, the CDHG has better performance than the constraint graph , possibly because
semantic information is more important for solving this task.


7. Related Work
Learning to represent programs. Contextual embedding methods (e.g. transformer [29],
BERT [30], GPT [31], etc.) achieved impressive results in understanding natural languages.
Some methods are adapted to explore source code understanding in text format (e.g.
CodeBERT [32], cuBERT [33], etc.). But, the programming language usually contains rich,
explicit, and complicated structural information, and the problem sets (learning targets)
of it [34, 35] are different from natural languages. Therefore, the way of representing the
programs and learning models should adapt to the programs’ characteristics. Recently, the
frameworks consisting of structural program representations (graph or AST) and graph or
tree-based deep learning models made good progress in solving program language-related
problems. For example, [36] represents the program by a sequence of code subtokens

    9
      https://github.com/ChenchengLiang/Horn-graph-dataset/tree/main/example-analysis/
task5-small-graphs
   10
      https://github.com/ChenchengLiang/Horn-graph-dataset/tree/main/example-analysis/
task5-big-graphs
Table 4
Experiment results for Tasks 1,3,4,5. Both the fourth and fifth tasks consist of two independent
binary classification tasks. Here, + and − stands for the positive and negative label. The T and P
represent the true and predicted labels. The Acc. is the accuracy of binary classifications. The Dom.
is dominant label ratio. Notice that even if the two graph representations originate from the same
original CHCs, the label distributions are different since the CDHG is constructed from normalized
CHCs.
                                     Constraint graph                            CDHG
                       P
  Task Files                +       -           Acc.      Dom.   +          -           Acc.    Dom.
                   T
                   +        93863 0                              142598 0
  1       1121                                  100%      95.1%                         99.9% 72.8%
                   -        0       1835971                      10         381445
                   +        3204    133                          8262       58
  3       1115                                  96.1% 70.1%                             99.6% 50.7%
                   -        301     7493                         15         8523
                   +        13685 5264                           30845      4557
  4 (a)                                         91.2% 79.7%                             94.3% 75.2%
                   -        2928    71986                        3566       103630
          1028
                   +        18888 4792                           41539      4360
  4 (b)                                         91.4% 74.8%                             94.3% 67.8%
                   -        3291    66892                        3715       92984
                   +        1048    281                          1230       206
  5 (a) 386                                     95.0% 84.7%                             96.9% 86.4%
                   -        154     7163                         121        9036
                   +        3030    558                          3383       481
  5 (b) 383                                     84.6% 53.1%                             90.6% 54.8%
                   -        622     3428                         323        4361


and predicts source code snippets summarization by a novel convolutional attention
network. Code2vec [37] learns the program from the paths in its AST and predicts
semantic properties for the program using a path-based attention model. [38] use AST to
represent the program and classify programs according to functionality using the tree-
based convolutional neural network (TBCNN). Some studies focus on defining efficient
program representations, others focus on introducing novel learning structures, while we
do both of them (i.e. represent the CHC-encoded programs by two graph representations
and propose a novel GNN structure to learn the graph representations).

Deep learning for logic formulas. Since deep learning is introduced to learn the
features from logic formulas, an increasing number of studies have begun to explore graph
representations for logic formulas and corresponding learning frameworks because logic
formulas are also highly structured like program languages. For instance, DeepMath [39]
had an early attempt to use text-level learning on logic formulas to guide the formal
method’s search process, in which neural sequence models are used to select premises
for automated theorem prover (ATP). Later on, FormulaNet [40] used an edge-order-
preserving embedding method to capture the structural information of higher-order logic
(HOL) formulas represented in a graph format. As an extension of FormulaNet, [41]
construct syntax trees of HOL formulas as structural inputs and use message-passing
GNNs to learn features of HOL to guide theorem proving by predicting tactics and tactic
arguments at every step of the proof. LERNA [42] uses convolutional neural networks
(CNNs) [43] to learn previous proof search attempts (logic formulas) represented by graphs
to guide the current proof search for ATP. NeuroSAT [44, 45] reads SAT queries (logic
formulas) as graphs and learns the features using different graph embedding strategies
(e.g. message passing GNNs) [46, 47, 26]) to directly predict the satisfiability or guide the
SAT solver. Following this trend, we introduce R-HyGNN to learn the program features
from the graph representation of CHCs.

Graph neural networks. Message passing GNNs [26], such as graph convolutional
network (GCN) [17], graph attention network (GAT) [48], and gated graph neural network
(GGNN) [47] have been applied in several domains ranging from predicting molecule
properties to learning logic formula representations. However, these frameworks only
apply to graphs with binary edges. Some spectral methods have been proposed to deal
with the hypergraph [49, 50]. For instance, the hypergraph neural network (HGNN) [51]
extends GCN proposed by [52] to handle hyperedges. The authors in [53] integrate
graph attention mechanism [48] to hypergraph convolution [52] to further improve the
performance. But, they cannot be directly applied to the spatial domain. Similar to the
fixed arity predicates strategy described in LambdaNet [18], R-HyGNN concatenates node
representations connected by the hyperedge and updates the representation depending on
the node’s position in the hyperedge.


8. Conclusion and Future Work
In this work, we systematically explore learning program features from CHCs using
R-HyGNN, using two tailor-made graph representations of CHCs. We use five proxy
tasks to evaluate the framework. The experimental results indicate that our framework
has the potential to guide CHC solvers analysing Horn clauses. In future work, among
others we plan to use this framework to filter predicates in Horn solvers applying the
CEGAR model checking algorithm.


References
 [1] J. Leroux, P. Rümmer, P. Subotić, Guiding Craig interpolation with domain-specific
     abstractions, Acta Informatica 53 (2016) 387–424. URL: https://doi.org/10.1007/
     s00236-015-0236-z. doi:10.1007/s00236-015-0236-z.
 [2] Y. Demyanova, P. Rümmer, F. Zuleger, Systematic predicate abstraction using
     variable roles, in: C. Barrett, M. Davies, T. Kahsai (Eds.), NASA Formal Methods,
     Springer International Publishing, Cham, 2017, pp. 265–281.
 [3] E. Clarke, O. Grumberg, S. Jha, Y. Lu, H. Veith, Counterexample-guided abstraction
     refinement, in: E. A. Emerson, A. P. Sistla (Eds.), Computer Aided Verification,
     Springer Berlin Heidelberg, Berlin, Heidelberg, 2000, pp. 154–169.
 [4] V. Tulsian, A. Kanade, R. Kumar, A. Lal, A. V. Nori, MUX: Algorithm selection
     for software model checkers, in: Proceedings of the 11th Working Conference on
     Mining Software Repositories, MSR 2014, Association for Computing Machinery, New
     York, NY, USA, 2014, p. 132–141. URL: https://doi.org/10.1145/2597073.2597080.
     doi:10.1145/2597073.2597080.
 [5] Y. Demyanova, T. Pani, H. Veith, F. Zuleger, Empirical software metrics for bench-
     marking of verification tools, in: J. Knoop, U. Zdun (Eds.), Software Engineering
     2016, Gesellschaft für Informatik e.V., Bonn, 2016, pp. 67–68.
 [6] C. Richter, E. Hüllermeier, M.-C. Jakobs, H. Wehrheim, Algorithm selection for
     software validation based on graph kernels, Automated Software Engineering 27
     (2020) 153–186.
 [7] C. Richter, H. Wehrheim, Attend and represent: A novel view on algorithm selection
     for software verification, in: 2020 35th IEEE/ACM International Conference on
     Automated Software Engineering (ASE), 2020, pp. 1016–1028.
 [8] D. P. Kingma, M. Welling, Auto-encoding variational Bayes, 2013. URL: https:
     //arxiv.org/abs/1312.6114. doi:10.48550/ARXIV.1312.6114.
 [9] H. Dai, Y. Tian, B. Dai, S. Skiena, L. Song, Syntax-directed variational autoencoder
     for structured data, 2018. URL: https://arxiv.org/abs/1802.08786. doi:10.48550/
     ARXIV.1802.08786.
[10] X. Si, A. Naik, H. Dai, M. Naik, L. Song, Code2inv: A deep learning framework
     for program verification, in: Computer Aided Verification: 32nd International
     Conference, CAV 2020, Los Angeles, CA, USA, July 21–24, 2020, Proceedings, Part
     II, Springer-Verlag, Berlin, Heidelberg, 2020, p. 151–164. URL: https://doi.org/10.
     1007/978-3-030-53291-8_9. doi:10.1007/978-3-030-53291-8_9.
[11] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. F. Zambaldi,
     M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, Ç. Gülçehre,
     H. F. Song, A. J. Ballard, J. Gilmer, G. E. Dahl, A. Vaswani, K. R. Allen, C. Nash,
     V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li,
     R. Pascanu, Relational inductive biases, deep learning, and graph networks, CoRR
     abs/1806.01261 (2018). URL: http://arxiv.org/abs/1806.01261. arXiv:1806.01261.
[12] T. Ben-Nun, A. S. Jakobovits, T. Hoefler, Neural code comprehension: A learnable
     representation of code semantics, CoRR abs/1806.07336 (2018). URL: http://arxiv.
     org/abs/1806.07336. arXiv:1806.07336.
[13] C. Lattner, V. Adve, LLVM: a compilation framework for lifelong program anal-
     ysis amp; transformation, in: International Symposium on Code Generation and
     Optimization, 2004. CGO 2004., 2004, pp. 75–86. doi:10.1109/CGO.2004.1281665.
[14] T. Mikolov, M. Karafiát, L. Burget, J. H. Cernocký, S. Khudanpur, Recurrent neural
     network based language model, in: INTERSPEECH, 2010.
[15] A. Horn, On sentences which are true of direct unions of algebras, The Journal of
     Symbolic Logic 16 (1951) 14–21. URL: http://www.jstor.org/stable/2268661.
[16] M. Olsák, C. Kaliszyk, J. Urban, Property invariant embedding for automated
     reasoning, CoRR abs/1911.12073 (2019). URL: http://arxiv.org/abs/1911.12073.
     arXiv:1911.12073.
[17] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. v. d. Berg, I. Titov, M. Welling, Modeling
     relational data with graph convolutional networks, 2017. URL: https://arxiv.org/
     abs/1703.06103. doi:10.48550/ARXIV.1703.06103.
[18] J. Wei, M. Goyal, G. Durrett, I. Dillig, LambdaNet: Probabilistic type inference
     using graph neural networks, 2020. arXiv:2005.02161.
[19] R. E. Tarjan, Depth-first search and linear graph algorithms, SIAM J. Comput. 1
     (1972) 146–160.
[20] G. Fedyukovich, P. Rümmer, Competition report: CHC-COMP-21, 2021. URL:
     https://chc-comp.github.io/2021/report.pdf.
[21] P. Rümmer, H. Hojjat, V. Kuncak, Disjunctive interpolants for Horn-clause verifica-
     tion (extended technical report), 2013. arXiv:1301.4973.
[22] R. W. Floyd, Assigning meanings to programs, Proceedings of Symposium on Applied
     Mathematics 19 (1967) 19–32. URL: http://laser.cs.umass.edu/courses/cs521-621.
     Spr06/papers/Floyd.pdf.
[23] C. A. R. Hoare, An axiomatic basis for computer programming, Commun. ACM 12
     (1969) 576–580. URL: https://doi.org/10.1145/363235.363259. doi:10.1145/363235.
     363259.
[24] N. Bjørner, A. Gurfinkel, K. McMillan, A. Rybalchenko, Horn clause solvers for
     program verification, 2015, pp. 24–51. doi:10.1007/978-3-319-23534-9_2.
[25] H. Hojjat, P. Ruemmer, The ELDARICA Horn solver, in: 2018 Formal Methods
     in Computer Aided Design (FMCAD), 2018, pp. 1–7. doi:10.23919/FMCAD.2018.
     8603013.
[26] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, G. E. Dahl, Neural message
     passing for quantum chemistry, CoRR abs/1704.01212 (2017). URL: http://arxiv.
     org/abs/1704.01212. arXiv:1704.01212.
[27] K. Xu, W. Hu, J. Leskovec, S. Jegelka, How powerful are graph neural net-
     works?, CoRR abs/1810.00826 (2018). URL: http://arxiv.org/abs/1810.00826.
     arXiv:1810.00826.
[28] C. Liang, P. Rümmer, M. Brockschmidt, Exploring representation of horn clauses us-
     ing gnns, 2022. URL: https://arxiv.org/abs/2206.06986. doi:10.48550/ARXIV.2206.
     06986.
[29] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser,
     I. Polosukhin, Attention is all you need, 2017. arXiv:1706.03762.
[30] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional
     transformers for language understanding, CoRR abs/1810.04805 (2018). URL:
     http://arxiv.org/abs/1810.04805. arXiv:1810.04805.
[31] A. Radford, K. Narasimhan, Improving language understanding by generative
     pre-training, 2018.
[32] Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu,
     D. Jiang, M. Zhou, CodeBERT: A pre-trained model for programming and natural
     languages, CoRR abs/2002.08155 (2020). URL: https://arxiv.org/abs/2002.08155.
     arXiv:2002.08155.
[33] A. Kanade, P. Maniatis, G. Balakrishnan, K. Shi, Pre-trained contextual embedding
     of source code, CoRR abs/2001.00059 (2020). URL: http://arxiv.org/abs/2001.00059.
     arXiv:2001.00059.
[34] H. Husain, H. Wu, T. Gazit, M. Allamanis, M. Brockschmidt, CodeSearchNet
     challenge: Evaluating the state of semantic code search, CoRR abs/1909.09436
     (2019). URL: http://arxiv.org/abs/1909.09436. arXiv:1909.09436.
[35] S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. B. Clement,
     D. Drain, D. Jiang, D. Tang, G. Li, L. Zhou, L. Shou, L. Zhou, M. Tufano, M. Gong,
     M. Zhou, N. Duan, N. Sundaresan, S. K. Deng, S. Fu, S. Liu, CodeXGLUE: A
     machine learning benchmark dataset for code understanding and generation, CoRR
     abs/2102.04664 (2021). URL: https://arxiv.org/abs/2102.04664. arXiv:2102.04664.
[36] M. Allamanis, H. Peng, C. Sutton, A convolutional attention network for extreme
     summarization of source code, CoRR abs/1602.03001 (2016). URL: http://arxiv.org/
     abs/1602.03001. arXiv:1602.03001.
[37] U. Alon, M. Zilberstein, O. Levy, E. Yahav, Code2vec: Learning distributed
     representations of code, Proc. ACM Program. Lang. 3 (2019) 40:1–40:29. URL:
     http://doi.acm.org/10.1145/3290353. doi:10.1145/3290353.
[38] L. Mou, G. Li, Z. Jin, L. Zhang, T. Wang, TBCNN: A tree-based convolutional
     neural network for programming language processing, CoRR abs/1409.5718 (2014).
     URL: http://arxiv.org/abs/1409.5718. arXiv:1409.5718.
[39] G. Irving, C. Szegedy, A. A. Alemi, N. Een, F. Chollet, J. Urban, DeepMath - deep
     sequence models for premise selection, in: D. D. Lee, M. Sugiyama, U. V. Luxburg,
     I. Guyon, R. Garnett (Eds.), Advances in Neural Information Processing Systems 29,
     Curran Associates, Inc., 2016, pp. 2235–2243. URL: http://papers.nips.cc/paper/
     6280-deepmath-deep-sequence-models-for-premise-selection.pdf.
[40] M. Wang, Y. Tang, J. Wang, J. Deng, Premise selection for theorem proving by deep
     graph embedding, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus,
     S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Sys-
     tems 30, Curran Associates, Inc., 2017, pp. 2786–2796. URL: http://papers.nips.cc/
     paper/6871-premise-selection-for-theorem-proving-by-deep-graph-embedding.pdf.
[41] A. Paliwal, S. M. Loos, M. N. Rabe, K. Bansal, C. Szegedy, Graph representations
     for higher-order logic and theorem proving, CoRR abs/1905.10006 (2019). URL:
     http://arxiv.org/abs/1905.10006. arXiv:1905.10006.
[42] M. Rawson, G. Reger, A neurally-guided, parallel theorem prover, in: A. Herzig,
     A. Popescu (Eds.), Frontiers of Combining Systems, Springer International Publishing,
     Cham, 2019, pp. 40–56.
[43] A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet classification with deep con-
     volutional neural networks, in: F. Pereira, C. J. C. Burges, L. Bottou, K. Q.
     Weinberger (Eds.), Advances in Neural Information Processing Systems, volume 25,
     Curran Associates, Inc., 2012. URL: https://proceedings.neurips.cc/paper/2012/file/
     c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
[44] D. Selsam, N. Bjørner, Neurocore: Guiding high-performance SAT solvers with
     unsat-core predictions, CoRR abs/1903.04671 (2019). URL: http://arxiv.org/abs/
     1903.04671. arXiv:1903.04671.
[45] D. Selsam, M. Lamm, B. Bünz, P. Liang, L. de Moura, D. L. Dill, Learning
     a SAT solver from single-bit supervision, CoRR abs/1802.03685 (2018). URL:
     http://arxiv.org/abs/1802.03685. arXiv:1802.03685.
[46] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph
     neural network model, IEEE Transactions on Neural Networks 20 (2009) 61–80.
     doi:10.1109/TNN.2008.2005605.
[47] Y. Li, D. Tarlow, M. Brockschmidt, R. Zemel, Gated graph sequence neural networks,
     2015. URL: https://arxiv.org/abs/1511.05493. doi:10.48550/ARXIV.1511.05493.
[48] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, Y. Bengio, Graph
     attention networks, 2017. URL: https://arxiv.org/abs/1710.10903. doi:10.48550/
     ARXIV.1710.10903.
[49] H. Gui, J. Liu, F. Tao, M. Jiang, B. Norick, J. Han, Large-scale embedding learning
     in heterogeneous event data, 2016, pp. 907–912. doi:10.1109/ICDM.2016.0111.
[50] K. Tu, P. Cui, X. Wang, F. Wang, W. Zhu, Structural deep embedding for hyper-
     networks, CoRR abs/1711.10146 (2017). URL: http://arxiv.org/abs/1711.10146.
     arXiv:1711.10146.
[51] Y. Feng, H. You, Z. Zhang, R. Ji, Y. Gao, Hypergraph neural networks, CoRR
     abs/1809.09401 (2018). URL: http://arxiv.org/abs/1809.09401. arXiv:1809.09401.
[52] T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional
     networks, CoRR abs/1609.02907 (2016). URL: http://arxiv.org/abs/1609.02907.
     arXiv:1609.02907.
[53] S. Bai, F. Zhang, P. H. S. Torr, Hypergraph convolution and hypergraph attention,
     2020. arXiv:1901.08150.