Selecting Quantifiers for Instantiation in SMT
Jan Jakubův1 , Mikoláš Janota1 , Bartosz Piotrowski1 , Jelle Piepenbrock1 and
Andrew Reynolds2
1
    Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Czechia
2
    Computational Logic Center, University of Iowa, USA


                                         Abstract
                                         This work attempts to choose quantifiers to instantiate based on a feedback from the SMT solver. This
                                         tackles the challenge that if a problem contains many quantifier expressions, it eventually gets flooded
                                         by too many generated ground terms. Therefore, we instantiate only some of the quantifiers but the
                                         question is, which are the useful ones?
                                             The SMT solver is modeled as a multi-armed bandit, where each quantifier is a lever producing a
                                         positive reward if an instantiation of the quantifier was useful. The issue is that we do not know whether
                                         an instantiation was useful or not until the given problem instance is actually proven (and then it’s
                                         too late). In this paper we explore several heuristics that attempt to assess the usefulness of quantifier
                                         instantiations.

                                         Keywords
                                         satisfiability modulo theories, quantifier instantiation, machine learning, reinforcement learning


1. Introduction
 Satisfiability modulo theories (SMT) solvers abstract quantifiers as propositions that are sources
 of instantiations. So a quantified sub-formula ∀𝑥 ⃗ 𝜑 is abstracted as a Boolean variable 𝑄 and
 instantiations are added in the form of an implication 𝑄 ⇒ 𝜑[𝑥    ⃗ ↦→ ⃗𝑡 ] for some ground terms
⃗𝑡. These ground terms are chosen from the set of ground terms already present in the ground
 formula. Consider the following toy formula:

                                                                    𝑓 (3) < 0 ∧ ((∀𝑥𝑓 (𝑥) > 0) ∨ (∀𝑥𝑓 (𝑥) > 𝑥)),
           where 𝑓 : int → int. To obtain a refutation, we abstract and instantiate as follows:

                                                         abstraction:                       𝑄1 ≡ (∀𝑥𝑓 (𝑥) > 0), 𝑄2 ≡ (∀𝑥𝑓 (𝑥) > 𝑥)
                                                 abstracted formula:                        𝑓 (3) < 0 ∧ (𝑄1 ∨ 𝑄2 )
                                                     instantiation 1:                       𝑄1 ⇒ 𝑓 (3) > 0
                                                     instantiation 2:                       𝑄2 ⇒ 𝑓 (3) > 3


$ jakubuv@gmail.com (J. Jakubův); mikolas.janota@gmail.com (M. Janota); bartoszpiotrowski@post.pl
(B. Piotrowski); jellepiepenbrock@gmail.com (J. Piepenbrock); andrew.j.reynolds@gmail.com (A. Reynolds)
 https://people.ciirc.cvut.cz/~jakubja5/ (J. Jakubův); https://people.ciirc.cvut.cz/~janotmik/ (M. Janota);
http://bartoszpiotrowski.pl/ (B. Piotrowski); https://homepage.cs.uiowa.edu/~ajreynol/ (A. Reynolds)
                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
    Most approaches instantiate quantifiers gradually, meaning, the new instantiations are added
after testing that the old ones do not already yield a contradiction in the ground solver. In this
work, we propose to use dynamic machine-learning-based heuristic to decide which quantifiers
should be instantiated during solving. We model the problem as multi-armed bandits (MAB),
which is one of the simplest form of reinforcement learning [1]. A multi-armed bandit is equipped
with a set of levers, which are pulled one at a time. Pulling a lever leads to a reward (typically a
real number). The multi-armed bandit problem is to find a strategy that leads to the highest
total reward overall. Our scenario is more complicated because quantifiers influence each other
and the SMT solver has a state, which changes with each instantiation. Hence, here we apply
the MAB model as a simplification of the problem as a first step in this area of research.
    The overall setup is shown in Figure 1. Quantifiers are instantiated in a loop that terminates
if the ground solver determines that the ground part is unsatisfiable. Our guidance picks the set
of quantifiers to be instantiated in each round. This guidance is based on feedback obtained
from the ground solver.

                    SMT                                     ML
                   solver            Instantiation         advice
          Input           Ground                                                 satisfiable
                                                      Instances
         formula        assignment                                                unsatisfiable
                                                                  Feedback
                                     Ground solver                           or infinite loop

Figure 1: Schematic of the SMT solver with ML guidance for quantifier instantiation.


    In this paper, we only consider the simplest strategy for picking the ground terms that we
instantiate with: the enumerative instantiation [2, 3]. There, the ground terms are enumerated
from the Herbrand universe, while almost completely ignoring the semantics of the formula at
hand. However, more advanced techniques exist [4, 5] and similar approach could be integrated
into them. In the standard setting, these techniques would instantiate all quantifiers that are
true in the current ground model. This can rapidly bloat the ground part, which means that
it is harder to solve but also that the space of possible instantiations is getting unwieldy. We
also remark that we had applied machine learning to ordering terms for instantiation within
cvc5 [6]—this could also be combined with the approach presented here.


2. Picking Quantifiers
In the standard MAB setting, only one lever is pulled at a time. This is not desirable in our
setting, because this could cause an unwieldy number of instantiation rounds. Therefore we
always pick a set of quantifiers (levers) to instantiate in each round. This is always done based
on some score assigned to each quantifier, where the score is calculated from the observations
made so far (Section 3). There are multiple ways of how one could use such scores. In our
experiments, we use a simple strategy where the top 𝑝 percent of quantifiers is selected in each
round, where 𝑝 is a fixed parameter.
3. Scores and Reward Heuristics
The objective is to assign to each quantifier a score that determines how likely that quantifier is
to be instantiated. This score combines two components. The first component estimates how
useful the quantifier has been in the past. The second component pushes the search towards
quantifiers that have been underused so far. This falls within the well-known paradigm in
reinforcement learning: exploration versus exploitation.
   The current implementation uses as a source of usefulness of a quantifier two values obtained
from the SAT solver, namely difficulty and activity measures. Both of these sources provide
an integer statistic for each quantifier, with values evolving in time. We further normalize the
values, dividing them by the number of current quantifier instantiations. Hence we obtain the
utility per quantifier instance. We use these values to estimate usefulness of quantifiers and to
guide quantifier instantiation.
   In our implementation, difficulty is mapping from formulas (those from input or lemmas
generated during solving) to natural numbers, where a formula with higher difficulty contributes
more towards a proof of unsatisfiability. This measure is incrementally updated during the
course of SMT solving. In particular, when a theory solver generates a conflict clause or lemma 𝜙,
we compute the literals ℓ that are propositionally entailed by the negation of 𝜙 (e.g., the negation
of all literals in a clause), call this set ilits(𝜙). For each such ℓ, we increment the difficulty
measure of the (oldest) formula 𝜓 such that the negation of ℓ is in ilits(𝜓). We subsequently
track this difficulty measure on the generated lemma 𝜙. Notice that quantifier instantiation
lemmas 𝑄 ⇒ 𝜑[𝑥    ⃗ ↦→ ⃗𝑡 ] increment difficulty on the oldest formula that implies 𝑄. The difficulty
measure of the lemma itself will be incremented whenever literals in the instantiated body
of the quantified formula participate in further conflicts and lemmas. Similarly, we measure
activity, which tracks how many times a formula participated in propositional propagation.
   As a basic estimate of the quantifier quality, we train a simple linear regression model to
obtain reasonable weights 𝛼 and 𝛽 for the linear combination of normalized activity (nact) and
normalized difficulty (ndiff ):
                                          𝛼 · nact + 𝛽 · ndiff.
As training data, we collect statistics of the values of nact and ndiff from runs on several
thousand problems evaluated with 60 seconds time limit. The statistics contain the values of
nact and ndiff for each quantifier in every instantiation round, thusly recording the progress
of the values in time. The quantifiers whose instantiations were needed to solve the problem
are marked as positive, other quantifiers as negative. From this training data, we train a linear
regression classifier. The linear regression training yielded the coefficients 𝛼 = 0.04 and 𝛽 = 0.1
and these shall be used for the experimental evaluation in Section 4. The prediction computed
by the linear formula can be directly used as the quality estimate, or it can be used as a reward
in the MAB setting as follows.
   To include the exploration factor we use the upper confidence bound (UCB) formula, where
the quality 𝑄𝑡 (𝑞) of the quantifier 𝑞 at time step 𝑡 (quantifier selection round) is defined as
                                                        √︃
                                                           log(𝑡)
                                    𝑄𝑡 (𝑞) = 𝑅𝑡 (𝑞) + 𝑐
                                                           𝑁𝑡 (𝑞)
with 𝑅𝑡 (𝑞) is the mean reward for the quantifier 𝑞 so far, and 𝑁𝑡 (𝑞) is the number of times 𝑞 was
selected for instantiation up to this point. As rewards, we use predictions of quantifier qualities
obtained from the linear model described above. Finally, 𝑐 is the confidence value controlling
the level of exploration with respect to exploitation. The idea behind UCB is that the quality is
primarily governed by the reward. The second summand will be higher for the quantifiers that
were less often selected in the previous instantiation rounds (quantifiers with smaller 𝑁𝑡 (𝑞)).
Hence the quality of unselected quantifiers keeps increasing and they will eventually become
selected in some future round. How fast this happens is controlled by the parameter 𝑐 where
higher values favor exploration.


4. Experiments
The above-presented ideas were implemented in the SMT solver cvc5 [7], while turning off all
the other instantiation techniques and focusing only on the enumerative instantiation [2, 3].
A subset 5000 of SMTLIB benchmark problems is used for the experiments. We deliberately
select problems with a large number of quantifiers. The experiments were run on a server with
four AMD EPYC 7513 32-Core Processor @ 2.6GHz and with 504 GB of memory.
   The default technique where all available quantifiers are instantiated in each round is used as
the baseline. To assess the performance of our techniques for quantifier selection, we evaluate
several random selection strategies. In each strategy, 𝑘% of randomly selected quantifiers is
instantiated with 𝑘 ranging from 10%, 20%, up to 90%. Apart from these control strategies, we
evaluate our linear model from Section 3 where the quality of each quantifier is computed as a
linear combination of two utility values nact and ndiff. The linear model is evaluated in two
modes, namely, with and without the use of the UCB formula. As with the random strategies,
we evaluate several variants with different selection ratios 𝑘, selecting 𝑘% of the quantifier
with the highest quality. In the case of the UCB formula, additional confidence coefficient 𝑐 is
required to control the level of exploration. We evaluate several values for 𝑐 between 0.001
and 10.
   The ML approach performs well in short timeouts. With the timeout of 1 s the baseline
strategy, which instantiates all available quantifiers, solved 1107 problems. The worst of the
random selection strategies for 𝑘 = 10% solved 1016 problems, and the best one for 𝑘 = 50%
already outperformed the baseline with 1171 solved problems. The linear model scored slightly
higher with 1186 problems, in particular, its best variant with 𝑘 = 50%. The UCB formula
led to an almost identical performance with 1190 solved problems, namely, its variant with
𝑘 = 80% and 𝑐 = 3. Our techniques perform slightly better than the random selection. The
two random strategies solved 1212 problems together, while the two ML strategies solved 1264.
   The experiment progress is depicted in Figure 2, where the graphs show the number of
problems solvable by each of the strategies in time 𝑡. Both graphs use the baseline strategy as
the reference. The left graph compares the baseline to the random selection, while the right
graph compares it to the best ML-based selection technique, namely the linear model with the
UCB formula. We see that the line for random 10% performs more poorly than the baseline. On
the other hand, the random 50% line stays above the baseline up till ∼ 60 s. The experiment
shows the versions with and without the UCB formula to be of comparable performance. The
Figure 2: Problems solved by various quantifier selection methods.


Figure 3: Runtime comparisons of the baseline and quantifier selection methods.


right graph, which compares our best ML selection method with best random selection, supports
our previous observation that the ML selection slightly outperforms the random one. The ML
approach stays above the baseline up till ∼ 100 s. The methods are slightly complementary, the
version without UCB adds 6.4% to the problems solved by the UCB version.
   While our strategies outperform the random selection, the improvement is rather small. This
is further supported by Figure 3, which compares the strategies in a scatter plot. For each solved
problem 𝑝, a point is plotted at the position (𝑡1 , 𝑡2 ) where 𝑡𝑖 is the runtime of strategy 𝑠𝑖 on
problem 𝑝. In both graphs, 𝑠1 is the baseline strategy, while 𝑠2 is the random selection of 50% in
the left graph and the linear model with UCB on the right. Hence the points below the diagonal
designate the runtime improvement, that is, problems solved faster than the baseline. For the
ML approach we observe a cluster of points below the diagonal with time below 10 s. For larger
times we observe both improvement and worsening of performance.
5. Conclusions and Future Work
In this extended abstract we report on our initial experiments that attempt to estimate the use-
fulness of quantifiers during search. This contrasts with standard machine learning techniques
that learn from one problem to another (cf. [8, 9, 6]). In our case, the objective is to obtain
feedback from the problem being solved. To our best knowledge, this is the first time such an
approach has been attempted. Eventually, machine learning across multiple instances could be
combined with single-instance learning.
   The experiments suggest that a guided quantifier selection can lead to a runtime improvement
and hence to more problems solved with very short timeouts. It seems, however, that the
improvement over the baseline strategy is decreasing with increasing time limits. In our future
research, we plan to investigate whether an improvement over higher time limits can be achieved
with more advanced methods for quantifier selection. In particular, we plan to investigate other
metrics for the usefulness of quantifiers.


Acknowledgments
The results were supported by the Ministry of Education, Youth and Sports within the dedicated
program ERC CZ under the project POSTMAN no. LL1902, the European Regional Develop-
ment Fund under the Czech project AI&Reasoning no. CZ.02.1.01/0.0/0.0/15_003/0000466, and
Amazon Research Awards. This article is part of the RICAIP project that has received funding
from the European Union’s Horizon 2020 research and innovation programme under grant
agreement No 857306.


References
[1] R. S. Sutton, A. G. Barto, Reinforcement Learning, Adaptive Computation and Machine
    Learning series, 2 ed., Bradford Books, Cambridge, MA, 2018.
[2] A. Reynolds, H. Barbosa, P. Fontaine, Revisiting enumerative instantiation, in: Tools and
    Algorithms for the Construction and Analysis of Systems, volume 10806, 2018, pp. 112–131.
    doi:10.1007/978-3-319-89963-3\_7.
[3] M. Janota, H. Barbosa, P. Fontaine, A. Reynolds, Fair and adventurous enumeration of
    quantifier instantiations, in: Formal Methods in Computer-Aided Design, 2021.
[4] D. Detlefs, G. Nelson, J. B. Saxe, Simplify: A theorem prover for program checking, J. ACM
    52 (2005) 365–473. doi:10.1145/1066100.1066102.
[5] Y. Ge, L. M. de Moura, Complete instantiation for quantified formulas in satisfiability
    modulo theories, in: Computer Aided Verification, 21st International Conference, CAV,
    2009, pp. 306–320. doi:10.1007/978-3-642-02658-4\_25.
[6] M. Janota, J. Piepenbrock, B. Piotrowski, Towards learning quantifier instantiation in SMT,
    in: Theory and Applications of Satisfiability Testing – SAT 2022, LIPIcs, 2022.
[7] H. Barbosa, C. W. Barrett, M. Brain, G. Kremer, H. Lachnitt, M. Mann, A. Mohamed, M. Mo-
    hamed, A. Niemetz, A. Nötzli, A. Ozdemir, M. Preiner, A. Reynolds, Y. Sheng, C. Tinelli,
    Y. Zohar, cvc5: A versatile and industrial-strength SMT solver, in: D. Fisman, G. Rosu (Eds.),
    Tools and Algorithms for the Construction and Analysis of Systems - 28th International
    Conference, TACAS 2022, Held as Part of the European Joint Conferences on Theory and
    Practice of Software, ETAPS, volume 13243 of Lecture Notes in Computer Science, Springer,
    2022, pp. 415–442. doi:10.1007/978-3-030-99524-9\_24.
[8] J. Urban, G. Sutcliffe, P. Pudlák, J. Vyskocil, MaLARea SG1 – machine learner for automated
    reasoning with semantic guidance, in: A. Armando, P. Baumgartner, G. Dowek (Eds.),
    Automated Reasoning, 4th International Joint Conference, IJCAR 2008, Sydney, Australia,
    August 12-15, 2008, Proceedings, volume 5195 of Lecture Notes in Computer Science, Springer,
    2008, pp. 441–456. doi:10.1007/978-3-540-71070-7\_37.
[9] J. Piepenbrock, T. Heskes, M. Janota, J. Urban, Guiding an automated theorem prover with
    neural rewriting, in: J. Blanchette, L. Kovács, D. Pattinson (Eds.), Automated Reasoning
    – 11th International Joint Conference, IJCAR 2022, Haifa, Israel, August 8-10, 2022, Pro-
    ceedings, volume 13385 of Lecture Notes in Computer Science, Springer, 2022, pp. 597–617.
    doi:10.1007/978-3-031-10769-6\_35.