<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Seventh Workshop on Practical Aspects of Automated Reasoning, June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Learning Precedences from Simple Symbol Features</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Filip Bártek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Suda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Czech Technical University in Prague - Czech Institute of Informatics</institution>
          ,
          <addr-line>Robotics and Cybernetics. Jugoslávských partyzánů 1580/3, 160 00 Praha 6 - Dejvice</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Czech Technical University in Prague - Faculty of Electrical Engineering.</institution>
          <addr-line>Technická 2, 166 27 Praha 6 - Dejvice</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>2</volume>
      <fpage>9</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>A simplification ordering, typically specified by a symbol precedence, is one of the key parameters of the superposition calculus, contributing to shaping the search space navigated by a saturation-based automated theorem prover. Thus the choice of a precedence can have a great impact on the prover's performance. In this work, we design a system for proposing symbol precedences that should lead to solving a problem quickly. The system relies on machine learning to extract this information from past successful and unsuccessful runs of a theorem prover over a set of problems and randomly sampled precedences. It uses a small set of simple human-engineered symbol features as the sole basis for discriminating the symbols. This allows for a direct comparison with precedence generation schemes designed by prover developers.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;saturation-based theorem proving</kwd>
        <kwd>simplification orderings</kwd>
        <kwd>symbol precedences</kwd>
        <kwd>machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Modern saturation-based automated theorem provers (ATPs) such as E [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], SPASS [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or
Vampire [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] use the superposition calculus [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] as their underlying inference system. Superposition is
built around the paramodulation inference [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] crucially constrained by simplification ordering
on terms and literals, which is supplied as a parameter of the calculus. Both of the two main
classes of simplification orderings used in practice, i.e., the Knuth-Bendix Ordering [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and
the Lexicographic Path Ordering [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], are mainly determined by a symbol precedence, a (partial)
ordering on the signature symbols.1
      </p>
      <p>
        While the superposition calculus is known [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to be refutationally complete for any
simplification ordering, the choice of the precedence may have a significant impact on how long
it takes to solve a given problem. In a well-known example, prioritizing in the precedence
the predicates introduced during the Tseitin transformation of an input formula [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] exposes
the corresponding literals to resolution inference during early stages of the proof search, with
the efect of essentially undoing the transformation and thus threatening with an exponential
blow-up that the transformation is designed to prevent [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. ATPs typically ofer a few heuristic
schemes for generating the symbol precedences. For example, the successful invfreq scheme
in E [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] orders the symbols by the number of occurrences in the input problem, prioritizing
symbols that occur the least often for early inferences. Experiments with random precedences
have shown that the existing schemes often fail to come close to the optimum precedence [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
revealing there is a large potential for further improvements.
      </p>
      <p>
        In this work, we design a system that, when presented with a First-Order Logic (FOL) problem,
proposes a symbol precedence that will likely lead to solving the problem quickly. The system
relies on the techniques of supervised machine learning and extracts such theorem-proving
knowledge from successful (and unsuccessful) runs of the Vampire theorem prover [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] when
run over a variety of FOL problems equipped with randomly sampled symbol precedences.
We assume that by learning to solve already solvable problems quickly, the acquired knowledge
will generalize and help solving problems previously out of reach. As a first step in a more
ambitious project, we focus here on representing the symbols in a problem by a fixed set
of simple human-engineered features (such as the number of occurrences used by invfreq
scheme mentioned above)2 and, to simplify the experimental setup, we restrict our attention
to learning precedences for predicate symbols only.3
      </p>
      <p>Learning to predict good precedences poses several interesting challenges that we address
in this work. First, it is not immediately clear how to characterize a precedence, a permutation
of a finite set of symbols, by a real-valued feature vector to serve as an input for a learning
algorithm. Additionally, to be able to generalize across problems we need to do it in a way
which does not presuppose a fixed signature size. There is also a complication, when sampling
diferent problems, that some problems may be easy to solve for almost every precedence and
others hard. In theorem proving, running times typically vary considerably. Finally, even with
a regression model ready to predict the prover’s performance under a particular precedence  ,
we still need to solve the task of finding an optimum precedence  * according to this model,
which cannot be simply solved by enumerating all the permutations and running the prediction
for each due to their huge number.</p>
      <p>
        Our way of addressing the above sketched challenges lies in using pairwise symbol preferences
to characterize a precedence, normalizing the target prover run times on a per problem basis,
and in the use of “second-order” learning of the preferences for symbols abstracted by their
features. These concepts are introduced in Section 3 and later formalized in Section 4. Section 5
presents the results of our experimental evaluation of the proposed technique over the TPTP [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
benchmark. We start our exposition by fixing the notation and basic concepts in Section 2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>We assume that the reader is familiar with basic concepts used in first order logic (FOL) theorem
proving. We use this section to recall and formalize the key notions relevant for our work.
Problem A (first-order) problem is a pair  = (Σ, Cl ), where Σ = (1, 2, . . . , ) is a list
of (predicate and function) symbols called the signature, and Cl is a set of first-order clauses</p>
      <sec id="sec-2-1">
        <title>2Automatic feature extraction using neural networks is planned for future work. 3Our theoretical considerations, however, apply equally to learning function symbol precedences.</title>
        <p>built over the symbols of Σ.</p>
        <p>
          The problem is either given directly by the user or could be the result of clausifying a general
FOL formula  , in which case we know which of the symbols were introduced during the
clausiifcation (namely during Tseitin transformation and skolemization; see e.g. Nonnengart and
Weidenbach [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]) and which occurred in the conjecture (if it was present).
        </p>
        <p>
          Precedence Given a problem  = (Σ, Cl ) with Σ = (1, 2, . . . , ), a precedence   is
a permutation, i.e. a bijective mapping, of the set of indices {1, . . . , }. A precedence  
determines a (total) ordering on Σ as follows:   (1) &lt;   (2) &lt; . . . &lt;   ().
Simplification Orderings are orderings on terms used to parameterize the superposition
calculus [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] employed by modern saturation-based theorem provers. The two classes of
simplification orderings most commonly used in practice, the Knuth-Bendix Orderings [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and
the Lexicographic Path Orderings [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], are both defined in terms of a user-supplied (possibly
partial) ordering &lt; on the given problem’s signature Σ. In this work, we assume that the
theorem prover uses a simplification ordering from one of these two classes relying on the ordering
on Σ determined by a precedence   to construct such a simplification ordering.
Performance measure A saturation-based ATP solves a problem  = (Σ, Cl ) (under a
particular fixed strategy and a determined symbol precedence   ) by either
• deriving from Cl a contradiction in the form of the empty clause, in which case  is
shown unsatisfiable , or
• finitely saturating the set of clauses Cl without deriving the contradiction, in which case
 is shown satisfiable .4
In both cases, we take the number of iterations of the employed saturation algorithm (see,
e.g., Riazanov and Voronkov [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] for an overview) as a measure of the efort that the ATP
took to solve the problem. We refer to this measure as the abstract solving time and denote it
ast (,   ).5
        </p>
        <p>In practice, an ATP can also run out of resources, typically out of the allocated time. In that
case, the abstract solving time is undefined : ast (,   ) = ⊥. While it may happen that running
an ATP with the same problem and symbol precedence two times yields a diferent result each
time (namely succeeding one time and failing another time), such cases are rare and we ensure
they do not interfere with the learning process by caching the results.</p>
        <p>Order matrix Given a permutation  of the set of indices {1, . . . , }, the order matrix O ( )
is a binary matrix of size  ×  defined in the following manner:</p>
        <p>O ( ), = q − 1() &lt;  − 1()y ,</p>
      </sec>
      <sec id="sec-2-2">
        <title>4We assume a refutationally complete calculus and saturation strategy.</title>
        <p>
          5The advantage of using abstract solving time is that it does not depend on the hardware used for the
computation.
where we use J K to denote the Iverson bracket [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] applied to a proposition  , evaluating to
1 if  is true, and 0 otherwise. In other words, for a symbol precedence   , (  ), = 1 if
the precedence   orders the symbol  before the symbol  , and (  ), = 0 otherwise.
        </p>
        <sec id="sec-2-2-1">
          <title>Flattened matrix</title>
          <p>lfattening  :</p>
          <p>Given a matrix  of size  × →,− is the vector of length 2 obtained by</p>
          <p>→− (− 1)+ = ,
for every ,  ∈ {1, . . . , }. For our use the exact way of mapping the matrix elements to
the vector indices is not important. We mostly just need a vector representation of the data
contained in the given matrix to have access to the dot product operation.</p>
          <p>Linear regression is an approach to modeling the relationship between scalar target values
 ∈ R and one or more input variables x = (1 , . . . , ),  = 1, . . . , . The relationship is
modeled using a linear predictor function:
whose unknown model parameters w ∈ R and  ∈ R are estimated from the data. We call
the vector w the coeficients of the model and  the intercept. Most commonly, the parameters
are picked to minimize the so-called mean squared error:</p>
          <p>^ = x · w + ,
MSE = 1 ∑︁(^ − )2,</p>
          <p>
            =1
but other norms are also possible [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ].
          </p>
          <p>
            Basic assumptions For the discussion that follows, we assume a fixed ATP that uses the
superposition calculus with a simplification ordering parameterized by a symbol precedence.
While the practical experiments described in Section 5 use the ATP Vampire [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ], the model
architecture does not assume a particular ATP and is compatible with any superposition-based
ATP such as E [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], SPASS [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] or Vampire. Within the prover a particular saturation strategy is
ifxed including a time limit.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. General considerations and overview</title>
      <p>The aim of this work is to design a system that learns to suggest good symbol precedences
to an ATP from observations of the ATP’s performance on a class train of problems with
randomly sampled precedences. Given a problem  = (Σ, Cl ) with | Σ | = , we consider
a precedence   good, if it leads to a low ast (,   ) among the ! possible precedences for
 . Note that for problems with a signature with more than a few symbols, repeatedly running
the prover with random precedences represents an efectively infinite source of training data.</p>
      <p>
        Ideally, we would like to learn general theorem proving knowledge, not too dependent
on train , which could be later explained and compared to precedence generation schemes
manually designed by the prover developers. Let us quickly recall one such scheme, already
mentioned in the introduction, called invfreq in E [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The prover’s manual [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] explains:
      </p>
      <sec id="sec-3-1">
        <title>Sort symbols by frequency (frequently occurring symbols are smaller).</title>
        <p>What is common to basically all manually designed schemes, is that they pick a certain scalar
property of symbols (here it is the symbol frequency, i.e. the number of occurrences of the symbol
in the given problem) and obtain a precedence by sorting the symbols using that property.
Decomposing We might want our system to also learn a certain property of symbols and
use sorting to generate and suggest a precedence. However, it is not clear how to “extract”
such property from the observed data, since we only have access to the target values for
full precedences. Our idea for “decomposing” these values into pieces that somehow relate
to individual symbols (and can thus be “transferred” across problems) is to take a detour using
symbol pairs: we assume that the performance of the ATP on  given   , i.e. our measure
ast (,   ), can be predicted from a sum of individual contributions corresponding to facts
of the form</p>
        <p>orders the symbol  before the symbol  .</p>
        <p>This is in line with how a prover developer could reason about a precedence generating scheme:
Even when it is not clear how good or bad a symbol is in absolute terms, one might have
an intuition that a symbol from a certain class should preferably come before a symbol from
another class in the precedence (e.g., symbols introduced during clausification should typically
be smaller than others) and assign some weight to this piece of intuition.</p>
        <p>In Section 4.2 we formalize this idea using the notion of preference matrix and show how, for
each problem in isolation, such preference matrix can be obtained in the form of coeficients
learned by linear regression.</p>
        <p>Learning across problems Symbol preferences learned on a particular problem are
inherently tied to that problem and do not immediately carry over to other problems. The main reason
for this is that symbols themselves only appear in the context of a particular problem.6 That is
why we resort to representing symbols by their features (cf. Section 4.3.2) when aggregating
the learned preferences across diferent problems. This is in more detail explained further below
and, formally, in Section 4.3.</p>
        <p>We also strive to ensure that the preference values across problems have possibly the same
magnitude. Note that ast (,   ) may vary a lot for a fixed problem  but all the more so
across problems. To obtain commensurable values, we normalize (see Section 4.1) the prover
performance data on a per problem basis before learning the preferences. Normalization also
deals with supplying a concrete value to those runs which did not finish, i.e. have ast (,   ) =
⊥.</p>
        <p>
          6On certain benchmarks, such as those coming from translations of mathematical libraries [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], symbols
maintain identity and meaning across individual problems. However, since our goal in this work is to learn general
theorem proving knowledge, we do not use the assumption of aligned signatures.
“Second-order” regression Once the symbols are abstracted by their feature vectors, we
can collect symbol preferences from all the tested problems and turn this collection into another
regression task. Note that at this moment, the preferences, which were obtained as the
coeficients learned by linear regression, themselves become the regression target. Thus, in a certain
sense, we now do second-order learning. It should be stressed though, that while the learning
of the preferences requires a linear regression model by design, this second-order regression
does not need to be linear and more sophisticated models can be experimented with.
        </p>
        <p>The details of this step are given in Section 4.3.</p>
        <p>Preference prediction and optimization Once the second-order model has been learned,
we can predict preferences for any pair of symbols based on their feature vectors and thus also
predict, given a problem  , how many steps will a prover require to solve it using a particular
precedence   . (For this second step, we reverse the idea of decomposition: we sum up those
predicted preferences that correspond to pairs of symbols ,  such that   orders the symbol
 before the symbol  – see Section 4.4 for details).</p>
        <p>
          Having access to an estimate of performance for each precedence   , the final step is to look
for a precedence  * that ideally minimizes the predicted performance measure over all the !
possible precedences on  ’s signature. Since finding the true optimum could be computationally
hard, we resort to using an approximation algorithm by Cohen et al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>The algorithm is recalled in Section 4.4.2.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Architecture</title>
      <p>4.1. Values of precedences
We define the base cost value cost base (,   ) of precedence   on problem  according to
the outcome of the proof search configured to use this precedence:
• If the proof terminates successfully, cost base (  ) is the number of iterations of the
saturation loop started during the proof search: cost base (  ) = ast (,   ).
• If the proof search fails (meaning that ast (,   ) = ⊥), then cost base (  ) is the
maximum number of saturation loop iterations encountered in successful training proof
searches on this problem: cost base (  ) = max ′ ∈Π+ ast (,  ′ ), where Π+ is the set of
all training precedences on problem  that yield a successful proof search.</p>
      <p>We further normalize the cost values by the following operations:
1. Logarithmic scaling: For each solvable problem, running proof search with uniformly
random predicate precedences reveals a distribution of abstract solving times on successful
executions. Examining these distributions for various problems suggests that they are
usually approximately log-normal. To make further scaling by standardization reasonable,
we first transform the base costs by taking their logarithm.
2. Standardization: Independently for each problem, we apply an afine transformation
so that the resulting cost values have the mean 0 and standard deviation 1. This ensures
that the values are comparable across problems.
Let cost std (  ) denote the resulting cost value of precedence   after the scaling and
standardization.
4.2. Problem preference matrix learning
Given a problem  with  symbols, a preference matrix  is any matrix over R of size  × .
We define the proxy cost of precedence   under preference  to be the sum of the preference
values  , of all symbol pairs ,  ordered by   such that  comes before  :
→−− − −O (  ) →·−−
cost proxy (  ,  ) = ∑︁ q − 1() &lt;  − 1()y  , =</p>
      <p>,
whe→r−e−− −O (  ) →·−− is the dot product of the flattened matrices O (  ) and  .</p>
      <p>For any given problem we can uniformly sample precedences   to form the training set
 = {( 1 , cost std ( 1 )), ( 2 , cost std ( 2 )), . . . , ( , cost std ( ))}. Having such training set
allows us to find a vector →−− that minimizes the mean square error
1</p>
      <p>∑︁
(  ,coststd (  ))∈</p>
      <p>(cost proxy (  ,  ) − cost std (  ))2
by linear regression.</p>
      <p>
        Minimizing the mean square error directly may lead to overfitting to the training set, especially
in problems whose signature is relatively large in comparison to the size of the training set.
To improve generalization, we use the Lasso regression algorithm [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] instead of standard linear
regression. We use cross-validation to set the value of the regularization hyperparameter.7
      </p>
      <p>Another reason to use the Lasso algorithm is that it performs regularization by imposing
a penalty on coeficients with large absolute value, efectively shrinking the coeficients that
correspond to symbol pairs whose mutual order does not afect the cost std (  ). We can use
this property to interpret the absolute value of preference value as a measure of the importance
of a given symbol pair.</p>
      <p>In the following sections we assume that the preference matrix  we find by Lasso regression
yields cost proxy that approximates cost std well.
4.3. General preference matrix learning
We proceed to cast the task of finding a good preference matrix  for an arbitrary problem
as a regression on feature vector representations of symbol pairs. To accomplish this we need
to be able to represent each pair of symbols by a feature vector and to know target preference
values for pairs of symbols in a training problem set.</p>
      <sec id="sec-4-1">
        <title>4.3.1. Target preference values</title>
        <p>For each problem  in the training problem set, we find a problem preference matrix by
the method outlined in Section 4.2. The target value of an arbitrary pair of symbols ,  in 
is  , .</p>
        <p>
          7See the model LassoCV in the machine learning library scikit-learn [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.3.2. Symbol pair embedding</title>
        <p>We represent each symbol by a numeric feature vector that consists of the following
components: symbol arity, the number of symbol occurrences in the problem, the number of clauses
in the problem that contain at least one occurrence of the symbol, an indicator of occurrence
in a conjecture clause, an indicator of occurrence in a unit clause, and an indicator of being
introduced during clausification. This choice of symbol features is motivated by the fact that
they are readily available in Vampire and that they sufice as a basis for common precedence
generation schemes, such as the invfreq scheme. We denote the feature vector corresponding
to symbol  as fv ().</p>
        <p>We represent a pair of symbols ,  by the concatenation of their feature vectors [fv (), fv ()].</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3.3. Training data</title>
        <p>The general preference regressor is trained on samples of the following structure:
• the input: [fv (), fv ( )] – the embedding of a symbol pair ,  in problem  ,
• the target:  , – an element of the preference matrix we learned for problem 
corresponding to the symbol pair (,  ).</p>
        <p>We sample problem  from the training problem set with uniform probability.</p>
        <p>Thanks to how  is constructed (see Section 4.2), preference values close to 0 are associated
with symbol pairs whose mutual order has little efect on the outcome of the proof search.
To focus the training on the symbol pairs whose order does matter, we weight the samples
by the absolute value of the target. More precisely, given a problem  , the probability of
sampling the symbol pair ,  is proportional to the absolute target value | , |. Experiments
have shown that using sample weighting improves the performance of the resulting model (see
Section 5.2).</p>
        <p>We denote the trained model as  and its prediction of the preference value of the symbol
pair ,  as  ([fv (), fv ( )]).
4.4. Precedence construction
When presented a new problem  = (Σ, Cl ), we propose a symbol precedence by taking
the following steps:
1. Estimate a preference matrix ̂︂ .</p>
        <p>2. Construct a precedence  ̂︁ that approximately minimizes cost proxy ( ̂︁ , ̂︂ ).</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4.1. Preference matrix construction</title>
        <p>To construct a preference matrix ̂︂ for a new problem  , we evaluate the general preference
regressor on the feature vectors of all symbol pairs in  . More specifically,</p>
        <p>̂︂ , =  ([fv (), fv ( )])
for all ,  ∈ Σ.</p>
        <p>At this moment, one can use ̂︂ to estimate the cost of an arbitrary symbol precedence.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.4.2. Precedence construction from preference matrix</title>
        <p>
          The remaining task is, given a preference matrix ̂︂ , to find a precedence  ̂︁ that minimizes
cost proxy ( ̂︁ , ̂︂ ). Since this task is NP-hard in general [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], we rely in this work on a greedy
2-approximation algorithm proposed by Cohen et al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. The rest of this section provides
a brief description of the algorithm.
        </p>
        <p>The algorithm maintains a partially constructed symbol precedence p ∈ N* (a finite sequence
over N; initially empty), a set of available symbols Σavail ⊆ Σ (initially the whole Σ) and
a potential value for each of the symbols  : Σavail → R. The potential value of a symbol
corresponds to the relative increase in proxy cost associated with selecting the symbol as the next
to append to the partial precedence:
() =</p>
        <p>∑︁
∈Σavail
̂︂ , −</p>
        <p>∑︁
∈Σavail
̂︂ ,</p>
        <p>In each iteration, a symbol  with the smallest potential is selected from Σavail . This symbol
is removed from Σavail and its index  is appended to the partial precedence p. The potentials
of the remaining symbols in Σavail are updated. This process is repeated until all symbols have
been selected, yielding the final p as  ̂︁ .</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Evaluation</title>
      <p>5.1. Setup
Since the simplification orderings under consideration (LPO and KBO) never use the symbol
precedence to compare a predicate symbol with a function symbol, we can break down the
symbol precedence into a predicate precedence and a function precedence. In this paper, we restrict
our attention to predicate precedences, leaving function symbols to be ordered by the invfreq
scheme. A more thorough evaluation of both predicate and function precedences and their
interaction is left for future work.</p>
      <p>
        We use problems from the TPTP library v7.2.0 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for the evaluation. Let train be the set
of all FOL and CNF problems in TPTP with at most 200 predicate symbols such that at least
1 out of 24 random predicate precedences leads to a successful proof search (|train | = 8217).
Let test be the set of all FOL and CNF problems in TPTP with at most 1024 predicate symbols
(|test | = 15751). In each of 5 evaluation iterations (splits), we sample 1000 training problems
from train and 1000 test problems from test uniformly in a way that ensures that the sets
do not overlap. We repeat the evaluation 5 times to evaluate the stability of the training.
      </p>
      <p>On each training problem we run Vampire with 100 uniformly random predicate precedences
and a strategy fixed up to the predicate precedence. 8 We limit the time to 10 seconds per
execution which is in our experience with Vampire suficient to exhibit interesting behavior. Note that
we use a customized version of Vampire to extract a symbol table from each of the problems.9
8Time limit: 10 seconds, memory limit: 8192 MB, literal comparison mode: predicate, function symbol
precedence: invfreq, saturation algorithm: discount, age-weight ratio: 1:10, AVATAR: disabled.</p>
      <p>9https://github.com/filipbartek/vampire/tree/926154f2</p>
      <p>After we fit a preference matrix on each of the training problems (see Section 4.2), we create
a batch of 106 symbol pair feature vectors with target values to train the general preference
regressor (see Section 4.3). We evaluate the trained model by running Vampire on the test
problem set with predicate precedences proposed by the trained model, counting the number
of successfully solved problems.</p>
      <p>A collection of scripts created for the experimental evaluation can be found in the Git
repository at https://github.com/filipbartek/vampire-ml/tree/75c693f3. The measurements presented
below can be performed by running the script map-reduce/paar2020/run.sh.
5.2. Experimental results</p>
      <sec id="sec-5-1">
        <title>We trained two types of general preference regressors (see Section 4.3):</title>
        <p>
          • Elastic-Net – a linear regression model with L1 and L2 norm regularization;
see ElasticNetCV in Pedregosa et al. [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]
• Gradient Boosting regressor – see GradientBoostingRegressor in Pedregosa et al.
        </p>
        <p>
          [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]
        </p>
        <p>We compared the performance of the regressors with three baseline precedence generation
schemes – random precedence, best of 10 random precedences and the invfreq scheme. Table 1
shows the results of evaluation on 1000 problems for 5 random choices of training and test
problem set (splits).</p>
        <p>The case “Elastic-Net without sample weighting” shows the efect of sampling the symbol
pairs uniformly. Inspection of the trained feature coeficients reveals that the fitting ends up
with an all-zero feature weight vector on splits 1 and 2, signifying a complete failure to learn
on these training sets.</p>
        <p>Using Elastic-Net for general preference prediction on average nearly matches the
performance of Vampire with the invfreq precedence scheme. While Elastic-Net performs
significantly better than a random precedence generator, it still performs significantly worse than
a generator that, given a problem, tries 10 random precedences and chooses the best of these.
This suggests that there is space for improvement, possibly with a more sophisticated,
nonlinear model. Plugging in a Gradient Boosting regressor does not show immediate improvement
so more elaborate feature extraction may be necessary.
5.3. Feature coeficients
Since Elastic-Net is a linear regression model, we can easily inspect the coeficients it assigns to
the input features (see Section 4.3.2). In each of the five splits, the final coeficients of the three
indicator features (namely the indicators of presence in a conjecture clause, presence in a unit
clause and being introduced during clausification) are 0. Table 2 shows the fitted non-zero
coeficients of the remaining features. The coeficients were scaled so that their absolute values
sum up to 1. Note that scaling the coeficients by a constant does not afect the precedence
constructed using the greedy algorithm presented in Section 4.4.2.</p>
        <p>It is worth pointing out that the regressor fitted on the whole train and on the training sets 1, 2
and 4 assigns a high preference value to symbol pairs (, ) such that  has a higher frequency and
unit frequency than . Since unit frequency is positively correlated with frequency, minimizing
cost proxy using this fitted regressor is consistent with the invfreq precedence generating
scheme (ordering the symbols by frequency in descending order). Similarly, the model fitted on
training sets 0 and 3 corresponds to ordering the symbols by arity in ascending order.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This paper is, to the best of our knowledge, a first attempt to use machine learning for proposing
symbol precedences for an ATP. This appears to be a potentially highly rewarding task with
an access to efectively unlimited amount of training data generated on demand. Nevertheless,
the journey from evaluating the prover on random precedences to proposing a good precedence
when presented with a new problem is not straightforward and several conceptual gaps need
to be bridged to connect these two tasks algorithmically.</p>
      <p>In this paper, we proposed a connection using the concept of pairwise symbol preferences
that, as we have shown, can be learned as the coeficients of a linear regression model for which
an order matrix provides the features of a precedence understood as a permutation. In a second
stage, in which symbols are abstracted by their features, the preferences themselves become
regression targets.</p>
      <p>
        In our initial experiments reported in this paper, the performance of our system does not
yet reach that of the human-designed heuristic invfreq. We believe, however, that further
improvements are possible by using a more advanced regression model for the second stage
and/or by further hyper-parameter tuning (e.g. of the Gradient Boosting model). Ultimately,
we expect to gain the most by using a richer set of symbol features, ideally automatically
extracted from the problems using graph neural networks [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>Supported by the ERC Consolidator grant AI4REASON no. 649043 under the EU-H2020
programme, the Czech Science Foundation project 20-06390Y and the Grant Agency of the Czech
Technical University in Prague, grant no. SGS20/215/OHK3/3T/37.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schulz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cruanes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vukmirović</surname>
          </string-name>
          , Faster, higher,
          <source>stronger: E 2</source>
          .3, in: P. Fontaine (Ed.),
          <source>Automated Deduction - CADE 27, number 11716 in Lecture Notes in Computer Science</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>495</fpage>
          -
          <lpage>507</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -29436-6_
          <fpage>29</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Weidenbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fietzke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Suda</surname>
          </string-name>
          , P. Wischnewski,
          <source>SPASS version 3</source>
          .5, in: R. A.
          <string-name>
            <surname>Schmidt</surname>
          </string-name>
          (Ed.),
          <source>Automated Deduction - CADE-22, number 5663 in Lecture Notes in Computer Science</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>140</fpage>
          -
          <lpage>145</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -02959-2_
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kovács</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Voronkov</surname>
          </string-name>
          ,
          <article-title>First-order theorem proving and Vampire</article-title>
          , in: N.
          <string-name>
            <surname>Sharygina</surname>
          </string-name>
          , H. Veith (Eds.), Computer Aided Verification, Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2013</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -39799-
          <issue>8</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nieuwenhuis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rubio</surname>
          </string-name>
          ,
          <article-title>Paramodulation-based theorem proving</article-title>
          , in: J. A.
          <string-name>
            <surname>Robinson</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Voronkov (Eds.),
          <source>Handbook of Automated Reasoning (in 2 volumes)</source>
          ,
          <article-title>Elsevier and</article-title>
          MIT Press,
          <year>2001</year>
          , pp.
          <fpage>371</fpage>
          -
          <lpage>443</lpage>
          . doi:
          <volume>10</volume>
          .1016/b978-044450813-3/
          <fpage>50009</fpage>
          -6.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Robinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wos</surname>
          </string-name>
          ,
          <article-title>Paramodulation and theorem-proving in first-order theories with equality</article-title>
          , in: J. H.
          <string-name>
            <surname>Siekmann</surname>
          </string-name>
          , G. Wrightson (Eds.),
          <source>Automation of Reasoning: 2: Classical Papers on Computational Logic</source>
          <year>1967</year>
          -1970, Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>1983</year>
          , pp.
          <fpage>298</fpage>
          -
          <lpage>313</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -81955-1_
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Knuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. B.</given-names>
            <surname>Bendix</surname>
          </string-name>
          ,
          <article-title>Simple word problems in universal algebras</article-title>
          , in: J. H.
          <string-name>
            <surname>Siekmann</surname>
          </string-name>
          , G. Wrightson (Eds.),
          <source>Automation of Reasoning: 2: Classical Papers on Computational Logic</source>
          <year>1967</year>
          -1970, Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>1983</year>
          , pp.
          <fpage>342</fpage>
          -
          <lpage>376</lpage>
          . doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>642</fpage>
          -81955-1_
          <fpage>23</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S. N.</given-names>
            <surname>Kamin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lévy</surname>
          </string-name>
          ,
          <article-title>Two generalizations of the recursive path ordering</article-title>
          ,
          <year>1980</year>
          . URL: http: //www.cs.tau.ac.il/~nachumd/term/kamin-levy80spo.pdf, unpublished letter to Nachum Dershowitz.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kovács</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Moser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Voronkov</surname>
          </string-name>
          ,
          <article-title>On transfinite Knuth-Bendix orders</article-title>
          , in: N.
          <string-name>
            <surname>Bjørner</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Sofronie-Stokkermans</surname>
          </string-name>
          (Eds.),
          <source>Automated Deduction - CADE-23, number 6803 in Lecture Notes in Computer Science</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2011</year>
          , pp.
          <fpage>384</fpage>
          -
          <lpage>399</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -22438-6_
          <fpage>29</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bachmair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ganzinger</surname>
          </string-name>
          ,
          <article-title>Rewrite-based equational theorem proving with selection and simplification</article-title>
          ,
          <source>Journal of Logic and Computation</source>
          <volume>4</volume>
          (
          <year>1994</year>
          )
          <fpage>217</fpage>
          -
          <lpage>247</lpage>
          . doi:
          <volume>10</volume>
          .1093/ logcom/4.3.217.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Tseitin</surname>
          </string-name>
          ,
          <article-title>On the complexity of derivation in propositional calculus</article-title>
          , in: J. H.
          <string-name>
            <surname>Siekmann</surname>
          </string-name>
          , G. Wrightson (Eds.),
          <source>Automation of Reasoning: 2: Classical Papers on Computational Logic</source>
          <year>1967</year>
          -1970, Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>1983</year>
          , pp.
          <fpage>466</fpage>
          -
          <lpage>483</lpage>
          . doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>642</fpage>
          -81955-1_
          <fpage>28</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Reger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Suda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Voronkov</surname>
          </string-name>
          ,
          <article-title>New techniques in clausal form generation</article-title>
          , in: C. Benzmüller, G. Sutclife, R. Rojas (Eds.),
          <source>GCAI 2016. 2nd Global Conference on Artificial Intelligence</source>
          , volume
          <volume>41</volume>
          of EPiC Series in Computing, EasyChair,
          <year>2016</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>23</lpage>
          . URL: https://easychair.org/publications/paper/XncX. doi:
          <volume>10</volume>
          .29007/dzfz.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schulz</surname>
          </string-name>
          ,
          <string-name>
            <surname>E</surname>
          </string-name>
          <year>2</year>
          .4
          <string-name>
            <given-names>User</given-names>
            <surname>Manual</surname>
          </string-name>
          ,
          <year>2019</year>
          . URL: http://wwwlehre.dhbw-stuttgart.de/~sschulz/ WORK/E_DOWNLOAD/V_2.4/eprover.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Reger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Suda</surname>
          </string-name>
          ,
          <article-title>Measuring progress to predict success: Can a good proof strategy be evolved?</article-title>
          ,
          <source>in: AITP</source>
          <year>2017</year>
          ,
          <year>2017</year>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>21</lpage>
          . URL: http://aitp-conference.org/
          <year>2017</year>
          / aitp17-proceedings.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sutclife</surname>
          </string-name>
          ,
          <article-title>The TPTP problem library and associated infrastructure</article-title>
          .
          <source>From CNF to TH0</source>
          ,
          <source>TPTP v6.4.0, Journal of Automated Reasoning</source>
          <volume>59</volume>
          (
          <year>2017</year>
          )
          <fpage>483</fpage>
          -
          <lpage>502</lpage>
          . doi:
          <volume>10</volume>
          .1007/ s10817-017-9407-7.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nonnengart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Weidenbach</surname>
          </string-name>
          ,
          <article-title>Computing small clause normal forms</article-title>
          , in: J. A.
          <string-name>
            <surname>Robinson</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Voronkov (Eds.),
          <source>Handbook of Automated Reasoning (in 2 volumes)</source>
          ,
          <article-title>Elsevier and</article-title>
          MIT Press,
          <year>2001</year>
          , pp.
          <fpage>335</fpage>
          -
          <lpage>367</lpage>
          . URL: https://doi.org/10.1016/b978-044450813-3/
          <fpage>50008</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1016/b978-044450813-3/
          <fpage>50008</fpage>
          -4.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Riazanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Voronkov</surname>
          </string-name>
          ,
          <article-title>Limited resource strategy in resolution theorem proving</article-title>
          ,
          <source>Journal of Symbolic Computation</source>
          <volume>36</volume>
          (
          <year>2003</year>
          )
          <fpage>101</fpage>
          -
          <lpage>115</lpage>
          . doi:
          <volume>10</volume>
          .1016/S0747-
          <volume>7171</volume>
          (
          <issue>03</issue>
          )
          <fpage>00040</fpage>
          -
          <lpage>3</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Iverson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Programming</given-names>
            <surname>Language</surname>
          </string-name>
          , John Wiley &amp; Sons, Inc., New York, NY, USA,
          <year>1962</year>
          . URL: https://dl.acm.org/doi/book/10.5555/1098666.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hastie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <source>The Elements of Statistical Learning</source>
          , Springer Series in Statistics, Springer New York Inc., New York, NY, USA,
          <year>2009</year>
          . doi:
          <volume>10</volume>
          .1007/b94608.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.</given-names>
            <surname>Kaliszyk</surname>
          </string-name>
          , J. Urban,
          <source>MizAR 40 for Mizar 40, Journal of Automated Reasoning</source>
          <volume>55</volume>
          (
          <year>2015</year>
          )
          <fpage>245</fpage>
          -
          <lpage>256</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10817-015-9330-8.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Schapire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Singer</surname>
          </string-name>
          , Learning to order things,
          <source>Journal Of Artificial Intelligence Research</source>
          <volume>10</volume>
          (
          <year>1999</year>
          )
          <fpage>243</fpage>
          -
          <lpage>270</lpage>
          . doi:
          <volume>10</volume>
          .1613/jair.587. arXiv:
          <volume>1105</volume>
          .
          <fpage>5464</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <article-title>Regression shrinkage and selection via the Lasso</article-title>
          ,
          <source>Journal of the Royal Statistical Society. Series B (Methodological) 58</source>
          (
          <year>1996</year>
          )
          <fpage>267</fpage>
          -
          <lpage>288</lpage>
          . URL: http://www.jstor.org/ stable/2346178.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , E. Duchesnay,
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          . URL: https://scikit-learn.org/.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey on graph neural networks</article-title>
          ,
          <source>IEEE Transactions on Neural Networks and Learning Systems</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          . doi:
          <volume>10</volume>
          .1109/tnnls.
          <year>2020</year>
          .
          <volume>2978386</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>