Investigating Characteristics and Differences Between Easy and Hard SAT Instances (Extended Abstract) Teofil Sidoruk1 Institute of Computer Science PAS, Warsaw t.sidoruk@ipipan.waw.pl Abstract. Following our comparison of the efficiency of SAT-solvers [19, 20], we analyse DIMACS input files previously generated for benchmarking purposes in an attempt to pinpoint some common characteristics for the CNF formulas that were relatively easier to process, i.e., were verified faster than comparable instances of the same size. 1 Introduction Since the early 1970s, when Cooke first proved it to be NP-complete, the Boolean satisfiability problem, or SAT, has undergone a dramatic rise in importance. From the subject of purely theoretical research in the area of computational complexity, SAT- solving algorithms have become the cornerstone of a broad range of important practical applications that rely on their efficiency. They include, but are not limited to: verification [1, 2], (un)bounded model checking [5, 7, 13, 24, 25], planning [15], and composition of web services [18]. It is equally important to note that the theoretical aspects of SAT also remain the subject of keen scientific interest. In our recent papers [19–21], we presented notable SAT-solvers, both state-of-the- art and historical, comparing their efficiency at several computational problems of vary- ing complexity: from P-complete chess problems to EXPTIME-complete Towers of Hanoi puzzle. One obvious observation stemming from our comparison is that no sin- gle SAT-solver is superior to others in the sense that it always performs faster regardless of the input. In other words, solving SAT remains a considerable challenge: despite the incredible progress made, especially in the last fifteen years, the potential for further improvement is as large as ever. The focus of this paper is not on SAT-solvers as such, but rather, on the input Boolean formulas themselves. Specifically, we will investigate CNF formulas that are comparatively easier or harder to verify compared to other generated instances of the same size, attempting to identify some common patterns in their properties. Given that this area that has not really been previously explored, this work is aiming to be an initial, small step rather than an exhaustive investigation. The rest of this paper is organized as follows. The next section summarises existing work related to the subject. Section 3 shortly presents DIMACS, the standardised input format used by SAT-solvers, as well as details the generation and analysis of input CNF formulas. In Section 4 experimental results are compared and discussed. The final section contains conclusions. 2 Theoretical Overview and Related Work In this section we discuss prior research into the difficulty of randomly generated in- stances of NP-complete problems. Many well-known, classical computational problems, though NP-complete, are rel- atively easy to solve when it comes to typical instances [6]. The graph k-coloring prob- lem, for example, was found to be solvable in logarithmic time in the vast majority of cases [23]. On the other hand, since their complexity was proven in Karp’s seminal 1972 paper [14], we are bound to encounter hard instances eventually. This brings about the question of whether there is any pattern to be found in the distribution of difficulty in a set of randomly generated instances, which has been the subject of research since at least the early 1990s. In the rest of this paper, we will focus on the Boolean satisfiability problem (SAT), since it serves as the convenient ’common denominator’ to which other hard problems are often translated. It has been long observed that certain specific instances of SAT pose an unusu- ally significant challenge to the the DPLL algorithm, contrary to perceived average difficulty. In [6] Cheeseman, Kanefsky and Taylor summarise classical NP-complete problems using ’order parameters’. For example, a set of instances of the Hamiltonian path problem can be ordered by the average connectivity of their respective graphs: the higher the connectivity, the higher the chance for a Hamiltonian path to exist. Further- more, the authors show the existence of a phase transition at the boundary marked by some critical value of the order parameter, which separates two distinct regions of likely satisfiable and likely unsatisfiable instances, both of which are comparatively easy to verify. It is at the boundary that the hardest instances occur. This phase transition is investigated further by Gent and Walsh in [11]. Their ex- perimental results confirm the association of hardest instances of problems with the boundary, and that median problem difficulty generally follows the expected easy-hard- easy pattern. However, they also show that the distribution of difficulty is significantly more complex, and in particular note the presence of a region where instances can be extraordinarily difficult, sometimes orders of magnitude harder than those closest to the phase transition. Gent and Walsh postulate the ’constraint gap’ to cause such unexpectedly hard prob- lems to occur in an otherwise satisfiable region. In the DPLL algorithm, neither unit propagation nor pure literal elimination ever branch out the search, leaving splitting (i.e., the choice of the branching literal) as the only critical point which can potentially result in an exponential blow-up in the number of explored assignments. This naturally leads to the conclusion that the harder the instance, the more the algorithm is forced to use the splitting rule compared to the other two. In other words, the hardest instances are ’constrained’ in the sense they have just enough constraints to be unsatisfiable, but very few more (or even none), forcing DPLL to utilize heuristics-based branching and thus increasing verification time dramatically. These results were further experimentally confirmed in other papers, including anal- yses for 3-SAT formulas by Larrabee and Tsuji [16] and by Crawford and Auton [8], with the latter work focusing on how the percentage of satisfiable instances changes as a function of the clause/variable ratio of the formula. 2 3 The Input Format, Formulas and Analysis In this section we present the DIMACS input format commonly used by SAT-solvers, discuss the input files used for our analysis and the factors taken into account during the latter. The renewed scientific interest in the Boolean satisfiability problem, and in partic- ular the emergence of SAT Competitions in the early 2000s, resulted in the need of a single, unified input file format. DIMACS has become such a standard. The format uses plain text to represent a Boolean formula in conjunctive normal form (CNF). Following an optional comment line and a header containing the number of clauses and literals in the formula, each subsequent line corresponds to a new clause. Variables are represented by subsequent natural numbers, with the minus sign denot- ing negation. Spaces separate literals in clauses, and zeroes signal end of clause. An example of a very simple CNF formula in the DIMACS format is shown below. Listing 1.1. The input file corresponding to a simple formula (x1 ∨ ¬x3) ∧ (x2 ∨ x3 ∨ ¬x1) c Example DIMACS i n p u t p cnf 3 2 1 −3 0 2 3 −1 0 The input files contain formulas resulting from the translations to SAT of several NP-complete problems, including classical graph problems (vertex coloring, vertex cover, Hamiltonian path) [14], as well as the extended string-to-string correction prob- lem (ESCP). They were originally created as benchmarks for our previous work, i.e., a comparison of SAT-solvers [19, 20]. For the purposes of this analysis, we have identified and separated groups of the most and least difficult instances for each of the aforementioned problems, that is, the input files whose processing required the most and the least time, respectively. When calculating verification time, the average of individual solvers’ processing times was considered. The solvers used were the same as in the aforementioned comparison: Lin- geling and Plingeling [4], Glucose and Glucose-syrup [3], Clasp [10], Minisat [22], ManySAT [12] and Microsoft Z3 [9]. However, zChaff [17] was excluded due to its age and inability to process many instances in reasonable time, which would have consid- erably skewed the average. The DIMACS files in both groups were subsequently analysed and compared w.r.t. factors such as total number of literals and clauses, average and maximum clause length, percentage of negative literals and percentage of Horn clauses. 4 Results In this section we discuss and compare the results of our analysis. In Tab. 1, several characteristics are compared between the groups of easiest and hardest generated instances of vertex k-colouring and vertex k-cover. For the former, parameters of n = 100 (graph size) and k = 10 (number of colours) were set. For the latter, graphs of size n = 50 were generated, with the vertex cover size at k = 30. 3 Vertex k-colouring Vertex k-cover Easier instances Harder instances Easier instances Harder instances Avg running time 0.018 s 364.759 s 0.108 s 38.941 s Avg number of clauses 24507 36007 36774 36994 Avg number of literals 49813 72813 74967 88127 Avg clause size 2.033 2.022 2.039 2.382 Longest clause 10 10 60 60 Negative literals 97.96% 98.63% 98.04% 83.41% Horn clauses 0% 0% 99.03% 99.34% Table 1. Comparison of characteristics between easier and harder instances of the vertex k- colouring and k-cover problems. Hamiltonian path String correction Easier instances Harder instances Easier instances Harder instances Avg running time 4.810 s 34.101 s 0.538 s 42.968 s Avg number of clauses 8000200 8000200 49808 49808 Avg number of literals 20135132 17115771 318476 318476 Avg clause size 2.517 2.139 6.394 6.394 Longest clause 200 200 66 66 Negative literals 80.66% 93.25% 19.67% 19.67% Horn clauses 0% 0% 0% 0.01% Table 2. Comparison of characteristics between easier and harder instances of the Hamiltonian path problem and ESCP. 4 It can be observed that the characteristics of hard instances depend primarily on the computational problem and its specific translation to SAT. For instance, in the ver- tex k-colouring problem, they have up to 50% more literals and clauses. This in turn can be attributed to the randomly generated input graphs for these instances having significantly more edges, and as such requiring more constraints in the form of clauses. Notably, despite the overwhelming majority of literals in both groups of instances being negative, there are no Horn clauses, again due to the specifics of the SAT encoding. On the other hand, for vertex k-cover the average number of clauses is roughly the same in easy and hard instances. However, the number of literals, and thus average clause length, is generally higher (up to 15%) in the latter group. Furthermore, an even more noteworthy difference is in the percentage of negative literals, which is also around 15% lower, suggesting that the extra literals in hard instances are positive. In the same way, Tab. 2 compares instances of the Hamiltonian path problem and the extended string-to-string correction problem (ESCP). For the former, generated graphs were of size n = 200, whereas the parameters for ESCP were set as n = 20 (length of input strings), k = 15 (maximum number of operations) and l = 5 (alphabet size). In the case of the Hamiltonian path problem, the harder instances actually have less literals and thus, on average, shorter clauses. This, too, is consistent with the nature of the problem in question: fewer edges (and thus shorter conditional clauses in the resultant formula) make for a graph that is harder to find a Hamiltonian path, while the most trivial satisfiable instance is actually one in which all possible edges exist. Finally, in the comparison of ESCP instances, all analysed characteristics are vir- tually identical, further emphasising the lack of any clear pattern behind the relative difficulty of specific instances of SAT. It is important to note that our analysis is not yet another attempt to back up the the findings previously described in Section 2, i.e., the existence of a phase transition and a ’constraint gap’ at the boundary between regions of expected (un)satisfiability. Instead of considering the distribution of difficulty across some order parameter, we took into account benchmarks generated using the same settings, i.e., the same order parameter, in an attempt to pinpoint patterns related to the composition of the formulas themselves. However, it clearly appears that the differences are related to the specific characteristics of computational problems translated to SAT. 5 Conclusions We have analysed Boolean formulas in CNF, representing translations of well-known NP-complete problems to SAT. The input files were grouped depending on their average processing time by SAT-solvers, and compared on several factors, including average clause length and percentage of Horn clauses, between the easiest and most difficult instances. There do not appear to be easily noticeable global characteristics of CNF formu- las representing harder instances of NP-complete problems. Depending on the specific problem and its translation to SAT, the formulas whose processing takes longer can, for instance, have longer clauses, or conversely, more clauses of similar average length to that in the ’easier’ group. Similarly, the percentage of negative literals or Horn clauses 5 are also dependent on the NP-complete problem translated to SAT, and not some pattern prevalent across all comparatively easier or harder instances. These observations seem in line with SAT being NP-complete, and as such, a dif- ficult computational problem. Just as there is not a single SAT-solver always offering superior performance, no single factor contributes to a particular instance of SAT being comparatively easier or harder to verify than others of same size. This was most evi- dent in the ESCP comparison: characteristics of both groups of instances were nearly identical, clearly showing that we cannot expect easy answers when it comes to hard computational problems. At least, not yet. References 1. P. A. Abdulla, P. Bjesse, and N. Eén. Symbolic reachability analysis based on SAT-solvers. In Proc. of the 6th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’00), volume 1785 of LNCS, pages 411–425. Springer-Verlag, 2000. 2. A. Armando and L. Compagna. An optimized intruder model for SAT-based model-checking of security protocols. In Electronic Notes in Theoretical Computer Science, volume 125, pages 91–108. Elsevier Science Publishers, March 2005. 3. G. Audemard and L. Simon. Glucose and syrup in the SAT race 2015. SAT Race, 2015. 4. A. Biere. Lingeling and friends entering the SAT challenge 2012. Proceedings of SAT Challenge, pages 33–34, 2012. 5. A. Biere, A. Cimatti, E. M. Clarke, M. Fujita, and Y. Zhu. Symbolic model checking using SAT procedures instead of BDDs. In Proc. of the ACM/IEEE Design Automation Conference (DAC), pages 317–320, 1999. 6. P. Cheeseman, B. Kanefsky, and W. M. Taylor. Where the really hard problems are. In Proceedings of the 12th International Joint Conference on Artificial Intelligence - Volume 1, IJCAI’91, pages 331–337, San Francisco, CA, USA, 1991. Morgan Kaufmann Publishers Inc. 7. E. Clarke, A. Biere, R. Raimi, and Y. Zhu. Bounded model checking using satisfiability solving. Formal Methods in System Design, 19(1):7–34, 2001. 8. J. M. Crawford and L. D. Auton. Experimental results on the crossover point in random 3-SAT. Artificial Intelligence, 81(1):31 – 57, 1996. Frontiers in Problem Solving: Phase Transitions and Complexity. 9. L. de Moura and N. Bjørner. Z3: An efficient SMT solver. In Proc. of TACAS’08, volume 4963 of LNCS, pages 337–340. Springer-Verlag, 2008. 10. M. Gebser, B. Kaufmann, A. Neumann, and T. Schaub. clasp : A conflict-driven answer set solver. In LPNMR, 2007. 11. I. P. Gent and T. Walsh. The hardest random SAT problems. In B. Nebel and L. Dreschler- Fischer, editors, KI-94: Advances in Artificial Intelligence, pages 355–366, Berlin, Heidel- berg, 1994. Springer Berlin Heidelberg. 12. Y. Hamadi, S. Jabbour, and L. Sais. ManySAT: a Parallel SAT Solver. JSAT, 6(4):245–262, 2009. 13. M. Kacprzak and W. Penczek. A SAT-based approach to unbounded model checking for alternating-time temporal epistemic logic. Synthese, 142:203–227, 2004. 14. R. Karp. Reducibility among combinatorial problems. In Complexity of Computer Compu- tations, pages 85–103. Plenum Press, 1972. 15. H. Kautz and B. Selman. Planning as satisfiability. In ECAI 92: Proceedings of the 10th European conference on Artificial intelligence, pages 359–363, 1992. 6 16. T. Larrabee and Y. Tsuji. Evidence for a satisfiability threshold for random 3CNF formulas. Technical report, 1993. 17. Y. Mahajan, Z. Fu, and S. Malik. Zchaff2004: An efficient sat solver. In Int. Conf. on Theory and Applications of Satisfiability Testing, pages 360–375. Springer, 2004. 18. A. Niewiadomski, W. Penczek, A. Półrola, M. Szreter, and A. Zbrzezny. Towards automatic composition of web services: SAT-based concretisation of abstract scenarios. Fundam. In- form., 120(2):181–203, 2012. 19. A. Niewiadomski, W. Penczek, and T. Sidoruk. Comparing Efficiency of Modern SAT- solvers for Selected Problems in P, NP, PSPACE, and EXPTIME. In Proceedings of the 26th International Workshop on Concurrency, Specification and Programming, Warsaw, Poland, September 25-27, 2017, pages 1–12, 2017. 20. A. Niewiadomski, P. Switalski, W. Penczek, and T. Sidoruk. Applying Modern SAT-solvers to Solving Hard Problems. Fundamenta Informaticae (submitted), 2018. 21. A. Niewiadomski, P. Switalski, W. Penczek, and T. Sidoruk. SMT-solvers in action: encoding and solving selected problems in NP and EXPTIME. Scientific Annals of Computer Science (submitted), 2018. 22. N. Sorensson and N. Een. Minisat v1. 13 - a SAT solver with conflict-clause minimization. SAT, 2005(53):1–2, 2005. 23. J. S. Turner. Almost all k-colorable graphs are easy to color. Journal of Algorithms, 9(1):63 – 82, 1988. 24. B. Woźna, A. Zbrzezny, and W. Penczek. Checking reachability properties for Timed Au- tomata via SAT. Fundamenta Informaticae, 55(2):223–241, 2003. 25. B. Woźna-Szczesniak. SAT-based bounded model checking for weighted deontic interpreted systems. Fundam. Inform., 143(1-2):173–205, 2016. 7