Analysis of comparative effectiveness of
state-of-the-art heuristics for CDCL SAT solvers
                 Stepan Kochemazov
                 Matrosov Institute for System Dynamics and Control Theory SB RAS, Irkutsk, Russia
                 E-mail: veinamond@gmail.com

                 Abstract. The Conflict-Driven Clause Learning algorithms for solving the Boolean
                 satisfiability problem comprise the major part of the methods used to solve various instances
                 of the problems that arise in industry and science. In recent years there have been proposed
                 several major heuristics for these algorithms which are assumed to be de facto good for the
                 solvers’ performance over diverse sets of benchmarks. The goal of this paper is to evaluate the
                 contribution of each separate heuristic to the performance of a state-of-the-art solver, see the
                 extent to which they are beneficial, and figure out if the heuristics have any particular features
                 that need to be taken into account.


1. Introduction
The Boolean satisfiability problem (SAT) [1] is one of the most well-studied NP-complete
problems in computer science. It is possible to effectively reduce a variety of problems from
diverse areas to SAT and effectively solve them in that form. The success of SAT today is
largely due to the development of the Conflict-Driven Clause Learning (CDCL) algorithm [2]
that was accomplished at the end of the XX-th century.
    Since SAT is NP-complete, modern implementations of CDCL employ a lot of heuristics.
Some of them can be considered baseline, such as the Variable State Independent Decaying Sum
(VSIDS) [3], restarts [4], or Literal Block Distance [5]. Others are either situational or were
proposed only recently. The latter include Learnt Clause Minimization [6], Distance heuristic
[7], Chronological Backtracking [8], Duplicate Learnts heuristic [9], and the use of Stochastic
Local Search component [10]. The interesting observation is that the majority of applications,
such as cryptography or bioinformatics, tend to use the older solvers developed prior to 2013,
such as MiniSAT [11] or glucose [5]. The new CDCL heuristics proposed in recent years have
yet to earn such acclaim.
    In the present paper, the goal is to evaluate the contribution of the state-of-the-art CDCL
heuristics proposed recently to the solvers’ effectiveness and compare them with each other.
The main studied question is whether the new heuristics do improve the solvers’ effectiveness
over a wide range of benchmarks or not. An additional goal is to understand the strengths and
weaknesses of these heuristics.
    A brief outline of the paper is as follows. The next section presents some background on
CDCL SAT solvers and the considered heuristics. The third section contains the methodology

Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
of experiments, including the description of several variants of the Relaxed LCMDCBDL newTech
SAT solver that took 2nd place in the SAT Competition 2020. These variants differ only
in the enabled heuristics. They are compared in the fourth section which is devoted to
the computational experiments and their discussion. It is followed by the conclusion and
acknowledgments.

2. Background
As was mentioned above, the modern CDCL SAT solvers are programmatic implementations of
the algorithms for solving the Boolean satisfiability problem. Essentially, they are the tools to
be used in practice for tackling hard problems, that can not be easily be solved using existing
effective approaches. The progress in CDCL SAT solvers is mainly realized via annual SAT
Competitions1 . These competitions use benchmark sets comprised of instances from diverse
application domains to test new heuristics and implementations of the CDCL algorithm. They
are popular with the developers of SAT solvers and related technologies. The emergence of
the MiniSAT solver in 2004 [11] marked an important stepping stone in the CDCL solvers
development thanks to the high quality of its programmatic implementation. To this day, the
majority of modern algorithms for solving SAT are either directly based on MiniSAT or employ
many of the implementation techniques it popularized. It would not be far off to say, that
MiniSAT can be viewed as one of the main progenitors of almost all modern SAT solvers. On
the heuristics side, MiniSAT implemented VSIDS and the frequent restarts. Later, the state of
the art in CDCL was extended by the concept of Literal Block Distance (LBD) [5]. However,
since then the progress in the area somewhat stagnated. Despite many new heuristics being
introduced, many SAT applications still employ MiniSAT and glucose solvers proposed more
than ten years ago.
    Nevertheless, the SAT competitions continue, and naturally, the heuristics which appear
in the winners of competitions are viewed with great interest. In the context of this paper,
several promising heuristics proposed in recent years will be considered. The brief outline and
description of the analyzed heuristics follow.
    • LCM: Learnt Clause Minimization – the heuristic proposed in [6], which is aimed at
      improving the quality of learnt clauses by applying to them a special procedure that makes
      it possible to remove the redundant literals from a clause.
    • DISTANCE: the distance heuristic was first introduced in [7]. Its goal is to initialize the
      values of the branching heuristic by directing the search at the beginning. It is relatively
      expensive, so in most implementations the heuristic works for at most 100 000 conflicts.
    • CB: Chronological Backtracking – this method described in [8] implements an attempt
      to make a partial return to the roots of SAT solving, in particular, to the
      Davis–Putnam–Logemann–Loveland (DPLL) algorithm. It allows the solver in certain
      conditions to ignore the nonchronological backtracking, which became one of the trademarks
      of CDCL solvers, in favor of a chronological one.
    • DL: Duplicate learnts heuristic [9] attempts to use simple data mining methods to discover
      the learnt clauses that are repeatedly derived during the search and store them permanently.
    • SLS: the Stochastic Local Search component augmented with Rephasing technique has been
      appearing in the growing number of solvers. While the idea of combining CDCL and SLS is
      far from novel, the implementation of the SLS component in Relaxed LCMDCBDL newTech
      [10] has several specific features that have not been used before.
  The implementation of each heuristic usually contributes major changes to some of the core
SAT solver components. Nevertheless, it is often possible to isolate the heuristic from the
1
    http://www.satcompetition.org/
rest of the algorithm. Since the goal of the present paper is to evaluate the heuristics and
compare their effectiveness, it was decided to perform the computational experiments using
the SAT solver which already implements all of the listed ones. In particular, there was used
the Relaxed LCMDCBDL newTech solver that took the second place at SAT Competition 2020.
Below it is referred to as RLNT. What makes it interesting compared to SAT Competition 2020
winner, the kissat solver, is that the lineage of RLNT can be easily followed back to Minisat
[11]. Indeed, RLNT is based on MapleLCMDistChronoBT-DL-v3 – the winner of SAT Race 2019,
which is based on MapleLCMDistChronoBT – the winner of SAT Competition 2018, based on
MapleLCMDist – the winner of SAT Competition 2017, based on MapleCOMSPS – the winner of
SAT Competition 2016, based on CominisatPS [12], based on Minisat and glucose. Informally,
it can be said that MapleLCMDist = MapleCOMSPS + LCM + DIST, MapleLCMDistChronoBT =
MapleLCMDist + CB, MapleLCMDistChronoBT-DL-v3=MapleLCMDistChronoBT+ DL. Finally,
RLNT = MapleLCMDistChronoBT-DL-v3 + SLS. The reality is more complex in the latter two
cases, since the difference between the corresponding solvers involves changes to some of the
parameters of existing heuristics.

3. Methodology of experiments
To evaluate the contribution of heuristics in equal conditions, there was developed the
implementation of RLNT, in which each of the heuristics listed in the previous section can
be enabled or disabled independently via the use of #ifdef mechanism. It means that by
manipulating the defined constants it is possible to change only the header of a single source
file, recompile the program and obtain, e.g. RLNT-noCB solver, which has the Chronological
Backtracking disabled. The general idea that is employed to compare the heuristics is to switch
them off separately. The greater the decrease in the solver’s performance from disabling a
heuristic, the greater is its benefit for the solver.

3.1. Solvers
For the experiments there have been prepared the following solvers that are the variants
of RLNT: RLNT itself, RLNT-noSLS, RLNT-noDL, RLNT-noCB, RLNT-noDIST, RLNT-noLCM,
RLNT-noDL-noDIST. Here, RLNT followed by no* refers to the version of RLNT in which
the heuristics corresponding to the solver name are disabled.          The latter variant,
RLNT-noDL-noDIST was evaluated because in the preliminary testing it showed good
performance. In addition to that, in order to evaluate the progress in CDCL solvers
from a slightly different perspective, the winners of SAT Competitions 2017-2019 were
also included: MapleLCMDist, MapleLCMDistChronoBT, MapleLCMDistChronoBT-DL-v3. To
offset them the versions of RLNT were prepared that coincide with the respective SAT
competition winners in the enabled heuristics, e.g.    RLNT-noSLS-noDL corresponds to
MapleLCMDistChronoBT, RLNT-noSLS-noDL-noCB corresponds to MapleLCMDist. Finally, the
version RLNT-noSLS-noDL-noCB-noDIST-noLCM is the variant in which all heuristics are disabled.

3.2. Benchmarks
To accurately assess the performance of CDCL SAT solvers it is necessary to use a diverse
set of benchmarks that come from different applications. The benchmark sets used in SAT
Competitions appear to satisfy this criterion, thus in this paper, they were used in the
experiments. In particular, the test instances from the main tracks of SAT Competitions 20162
and 20203 , and also SAT Race 20194 were used. They are freely available from the corresponding
2
  https://baldur.iti.kit.edu/sat-competition-2016/index.php?cat=downloads
3
  https://satcompetition.github.io/2020/downloads.html
4
  http://sat-race-2019.ciirc.cvut.cz/index.php?cat=downloads
websites. In total there were 1300 instances.

3.3. Evaluation criteria
Remind, that when a SAT solver is launched with a fixed time limit on some SAT instance,
there are three possible outcomes. First, it can find a satisfying assignment, meaning that a
SAT instance is satisfiable. Second, it can prove that there are no satisfying assignments and
thus the input formula is unsatisfiable. If the solver terminates due to the time limit, then the
status of the instance is unknown. Overall, taking into account the fact that a formula can have
many satisfying assignments, it may seem that satisfiable instances are significantly easier to
tackle for a SAT solver, compared to unsatisfiable instances, for which it is required to construct
a specific proof. Empirical observations show that it is indeed true but to a limited extent. Both
hard and simple cases of satisfiable and unsatisfiable instances are often encountered in practice.
    In the experiments, each considered solver was launched once on each instance from the
benchmark set. The outcome of the launch, together with the runtime of a solver was recorded.
The statistics presented below include the count of solved instances, divided into counts for
solved satisfiable and unsatisfiable ones. Similar to the SAT competitions, these numbers are
accompanied by the PAR-2 score (Penalized Average Runtime with factor 2) which is computed
as follows. First, a sum of terms is computed, where for each instance from the benchmark set
the term is formed as a runtime of a solver if it solved this instance, or as 2 times the time limit
if the solver did not. Then this sum is divided by the number of terms. Given the two solvers
that solved the equal number of instances, the one with the lower value of PAR-2 is faster on
average.

3.4. Computing platform
Following the standard SAT Competition procedure, all solvers were launched with a time limit
of 5000 seconds. As a computing platform the PCs with 16-core AMD Ryzen 3950x CPUs
and 32Gb RAM were used, operating under Ubuntu 20.10. The solvers were launched in 16
simultaneous threads without restrictions on memory usage.

4. Computational experiments
The results of the experiments are summarized in Table 1. The so-called cactus plots, that
present the performance of the solvers on satisfiable instances and unsatisfiable instances, are
showed in Figures 1 and 2. The plotline for each solver displays the runtimes of this solver
over all benchmarks, ordered in ascending order. It means that on the cactus plot, the further
the plotline is to the right – the larger number of instances were successfully tackled by the
corresponding solver within the time limit, and the closer the line is to the bottom – the smaller
is the average runtime of a solver over all benchmarks.

4.1. Discussion
The analysis of the results in Table 1, together with that presented in cactus plots at Figure
1 and 2 allows one to make several observations. First, it is quite clear that, at least on the
employed benchmark set, the heuristics do not yield equal gain. The first three rows of the
Table clearly show, that dropping some of the heuristics increases the solver’s performance. In
particular, it appears that disabling DL or DIST allow RLNT to solve several more satisfiabile
and unsatisfiable instances. Disabling them together yields even greater benefit. Therefore, it
is necessary to reevaluate the combination of heuristics for each particular solver if the goal is
to achieve the best possible performance. It means that when incorporating new heuristics into
a solver, it is best to study how it fits into the general picture. However, this makes it much
harder to perform straightforward fair comparisons. Another interesting observation that can
Table 1. The results of computational evaluation of different solvers on the joint set of
benchmarks from SAT Competitions 2016 and 2020 and SAT Race 2019 (1300 tests in
total). SCR (Solution Count Ranking) refers to the total number of solved instances.SCR:SAT
and SCR:UNSAT refer to the numbers of solved satisfiable and unsatisfiable benchmarks,
respectively. PAR2 column refers to penalized average runtime with parameter 2.
      Solver                                  SCR SCR:SAT SCR:UNSAT PAR2
      RLNT-noDL-noDIST                        726    405              321       4841.34
      RLNT-noDIST                             721    400              321       4875.14
      RLNT-noDL                               721    403              318       4883.67
      RLNT                                    715    399              316       4904.70
      RLNT-noCB                               697    378              319       5024.10
      RLNT-noSLS-noDL                         694    344              350       5179.37
      RLNT-noLCM                              684    387              297       5116.70
      MapleLCMDistChronoBT-DL-v3              682    331              351       5252.24
      RLNT-noSLS-noDL-noCB                    682    337              345       5270.08
      RLNT-noSLS                              671    331              340         5300
      RLNT-noSLS-noDL-noCB-noDIST-noLCM 670          340              330       5341.44
      MapleLCMDistChronoBT                    665    312              353       5413.83
      MapleLCMDist                            658    303              355       5483.10


                                                                        SAT
                 5000
                            RLNT-noDL-noDIST
                            RLNT-noDL
                            RLNT-noDIST
                            RLNT
                            RLNT-noLCM
                            RLNT-noCB
                 4000       RLNT-noSLS-noDL
                            RLNT-noSLS-noDL-noCB-noDIST-noLCM
                            RLNT-noSLS-noDL-noCB
                            MapleLCMDistChrBt-DL-v3
                            RLNT-noSLS
                            MapleLCMDistChronoBT
                            MapleLCMDist
                 3000
Time (seconds)


                 2000


                 1000


                   0
                        0             50                100     150   200         250    300        350    400
                                                                      Instances


                                Figure 1. Cactus plot for considered solvers over satisfiable instances.


be drawn from Figure 1 is the following: it is clear that the SLS heuristic appears to give the
solver the ability to cope with at least ≈ 50 more satisfiable test instances from the considered
set of benchmarks than the competition, which is quite surprising. It is the SLS heuristic that
is responsible for the large gap between two groups of plot lines at Figure 1. On the other
                                                                            UNSAT
                 5000
                            MapleLCMDist
                            MapleLCMDistChronoBT
                            MapleLCMDistChrBt-DL-v3
                            RLNT-noSLS-noDL
                            RLNT-noSLS-noDL-noCB
                            RLNT-noSLS
                 4000       RLNT-noSLS-noDL-noCB-noDIST-noLCM
                            RLNT-noDL-noDIST
                            RLNT-noDIST
                            RLNT-noCB
                            RLNT-noDL
                            RLNT
                            RLNT-noLCM
                 3000
Time (seconds)


                 2000


                 1000


                   0
                        0                50                     100   150               200   250   300    350
                                                                            Instances


                              Figure 2. Cactus plot for considered solvers over unsatisfiable instances.


hand, from Table 1 it can be seen that dropping SLS results in a significant gain (about 30) in
the number of unsatisfiable instances solved, which is, however, overshadowed by the decrease
in the number of solved satisfiable tests. Apart from SLS, the next best heuristic appears
to be LCM which mostly targets the unsatisfiable instances. The CB heuristic which helps
the solver a lot with satisfiable instances takes third place. Another observation is that the
overall architecture of SAT solvers indeed progressed slightly in recent years, since, for example,
RLNT-noSLS-noDL shows better performance compared to MapleLCMDistChronoBT which uses
the same set of heuristics.
   Comparing the cactus plots at Figures 1 and 2 one can conclude that since 2017, the
solvers started to switch focus to satisfiable instances, often at the cost of unsatisfiable ones.
In particular, the MapleLCMDist solver from 2017 managed to solve the largest number of
unsatisfiable instances and the smallest number of satisfiable ones out of all tested solvers.
The switch in focus, however, is accompanied by overall progress that results in decreasing the
average solver runtimes.
   Now, let us look at the detailed statistics on satisfiable and unsatisfiable instances solved by
each solver in three benchmark sets, corresponding to SAT Competition 2016 (SC2016), SAT
Race 2019 (SR2019), and SAT Competition 2020 (SC2020). It is presented in Table 2. The
horizontal line after the sixth row separates the solvers with SLS and the solvers without this
heuristic. Observe, that for both SC2016 and SR2019 the enabling of SLS leads to the increase
of the number of satisfiable instances solved by a moderate amount, from 3 to 12. On the
other hand, for SC2020 the difference induced by SLS is at least 25 instances. It means that
the benchmark set for SC2020 is (hopefully, inadvertently) biased towards the SLS heuristic.
As it was mentioned above, enabling SLS in all cases results in the decrease of unsatisfiable
instances solved, but by a relatively small amount. From the analysis of other results, it can be
conjectured, that the benchmark set from SC2016 is biased towards LCM, but to a significantly
smaller degree. Observe that disabling LCM results in ≈ 10 less unsatisfiable instances solved
in all cases. As for the SR2019 benchmark series, it appears to be more or less balanced and does
not appear to be biased towards any specific heuristic. However, it should be taken into account
that the proportion of satisfiable instances solvable under a time limit to that of unsatisfiable
instances. From Table 2 it is clear that SC2016 favors heuristics targeting unsatisfiable instances,
while SR2019 and SC2020 benchmark sets are better for solvers with heuristics that work well
on satisfiable instances.

        Table 2. Detailed statistics on satisfiable and unsatisfiable instances solved.
                                                SC2016            SR2019            SC2020
                                           SAT UNSAT SAT UNSAT SAT UNSAT
 RLNT-noDL-noDIST                          099       121      159       095      147    105
 RLNT-noDIST                               095       120      154       095      151    106
 RLNT-noDL                                 090       119      161       094      152    105
 RLNT                                      089       118      159       094      151    104
 RLNT-noCB                                 085       122      156       094      137    103
 RLNT-noLCM                                087       107      159       089      141    101
 RLNT-noSLS-noDL                           083       139      150       099      111    112
 MapleLCMDistChronoBT-DL-v3                077       142      145       098      109    111
 RLNT-noSLS-noDL-noCB                      080       139      147       099      110    107
 RLNT-noSLS                                081       135      142       097      108    108
 RLNT-noSLS-noDL-noCB-noDIST-noLCM         081       128      147       094      112    108
 MapleLCMDistChronoBT                      076       145      141       099      095    109
 MapleLCMDist                              077       147      138       100      088    108

   Taking into account all of the above, it is quite clear, that from the point of view of application
domains, where the majority of SAT instances to solve are unsatisfiable, the modern CDCL
heuristics are not very enticing. On the other hand, the domains which work mainly with
satisfiable instances are likely to benefit from using the cutting-edge SAT competition winners.
Overall, the more heuristics exist – the better, because since SAT is the NP-complete problem,
it means that in advance, no one can predict whether some heuristic will work on the particular
class of instances or not. That is why the choice makes the area richer with possibilities for
improvement.

5. Conclusion
The results of the computational experiments indicate that in the CDCL SAT solvers that employ
many separate heuristics it is important to sometimes spend time on reviewing their performance
and tuning the existing configurations because sometimes disabling a heuristic can result in the
performance gain comparable to that from adding a brand new one. The predispositions of some
heuristics towards satisfiable or unsatisfiable instances should also be factored into the choice of
whether to employ them in each particular practical case.

Acknowledgments
The research was supported by the RFBR grant No.19-07-00746.

References
 [1] Biere A, Heule M, van Maaren H and Walsh T (eds) 2009 Handbook of Satisfiability (Frontiers in Artificial
       Intelligence and Applications vol 185) (IOS Press) ISBN 978-1-58603-929-5
 [2] Marques-Silva J P, Lynce I and Malik S 2009 Conflict-driven clause learning SAT solvers in Biere et al. [1]
       pp 131–153
 [3] Moskewicz M W, Madigan C F, Zhao Y, Zhang L and Malik S 2001 Chaff: Engineering an efficient SAT
        solver Proceedings of the 38th Annual Design Automation Conference DAC ’01 pp 530–535
 [4] Luby M, Sinclair A and Zuckerman D 1993 Inf. Process. Lett. 47 173–180
 [5] Audemard G and Simon L 2009 Predicting learnt clauses quality in modern SAT solvers IJCAI pp 399–404
 [6] Luo M, Li C, Xiao F, Manyà F and Lü Z 2017 An effective learnt clause minimization approach for CDCL
        SAT solvers IJCAI pp 703–711
 [7] Xiao F, Luo M, Li C M, Manya F and Lu Z 2017 MapleLRB LCM, Maple LCM, Maple LCM Dist,
        MapleLRB LCMoccRestart and glucose-3.0+width in SAT competition 2017 Proc. of SAT Competition
        2017 vol B-2017-1 pp 22–23
 [8] Nadel A and Ryvchin V 2018 Chronological backtracking SAT (LNCS vol 10929) pp 111–121
 [9] Kochemazov S, Zaikin O, Semenov A A and Kondratiev V 2020 Speeding up CDCL inference with duplicate
        learnt clauses ECAI 2020 (FAIA vol 325) pp 339–346
[10] Zhang X and Cai S 2020 Relaxed backtracking with rephasing Proc. of SAT Competition 2020 vol B-2020-1
        pp 15–16
[11] Eén N and Sörensson N 2004 An extensible sat-solver SAT 2004 (LNCS vol 2919) pp 502–518
[12] Oh C 2015 Between SAT and UNSAT: The fundamental difference in CDCL SAT SAT (LNCS vol 9340) pp
        307–323