<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>J. Jakubův); mikolas.janota@gmail.com (M. Janota)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Characteristic Subsets of SMT-LIB Benchmarks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan Jakubův</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mikoláš Janota</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrew Reynolds</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Czech Technical University in Prague</institution>
          ,
          <addr-line>Prague, Czechia</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The University of Iowa</institution>
          ,
          <addr-line>Iowa City</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Innsbruck</institution>
          ,
          <addr-line>Innsbruck</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>A typical challenge faced when developing a parametrized solver is to evaluate a set of strategies over a set of benchmarking problems. When the set of strategies is large, the evaluation is often done with a restricted time limit and/or on a smaller subset of problems in order to estimate the quality of the strategies in a reasonable time. Firstly, considering the standard SMT-LIB benchmarks, we ask the question how much the time evaluation limit and benchmark size can be restricted to still obtain reasonable performance results. Furthermore, we propose a method to construct a benchmark characteristic subset which faithfully characterizes all benchmark problems. To achieve this, we collect problem performance statistics and employ clustering methods. We evaluate the quality of our benchmark characteristic subsets on the task of the best cover construction, and we compare the results with randomly selected benchmark subsets. We show that our method achieves smaller relative error than random problem selection.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Benchmarking</kwd>
        <kwd>Satisfiability Modulo Theories</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Clustering</kwd>
        <kwd>Benchmark Characteristic Subsets</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Optimizing the performance of a Satisfiability Modulo Theories (SMT) solver, like CVC4 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
or Z3 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], on a large set of problems, like the SMT-LIB library [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], is a time-consuming task.
This is because of a large number of problems in SMT-LIB (over 100000). Even with the help of
parallelization, it takes several hours to evaluate a single strategy over all benchmark problems
with a standard time limit (like 60 seconds). The situation gets even worse when more than one
strategy, or a parametrized strategy with diferent arguments, needs to be evaluated. Then it
is common practice to restrict the evaluation to a subset of problems and/or to decrease the
evaluation time limit. In this paper, we try to address the question how much can the evaluation
be restricted so that the results are still plausible. Furthermore, we propose a method to construct
a small benchmark characteristic subset that would faithfully represent all benchmark problems
for the sake of evaluation.
      </p>
      <p>Within a large problem library, one can expect a large number of syntactically similar problems
or problems with similar performance with respect to many strategies. Identification of similar
problems could help us to speed up the evaluation, as we can select a single representative
from each similarity class. To identify classes of similar problems, we propose to perform short
evaluation runs of several CVC4 strategies and to collect run-time statistics. Then, problems
with similar statistics might be considered similar. To evaluate this approach, we experiment
with the best cover construction, that is, a selection of strategies with the best performance on
a given set of problems. We construct the best cover on a smaller benchmark subset and we
compare its performance with the optimal cover constructed on all problems. This gives us a
quality measure of problem subset selection.</p>
      <p>The paper is structured as follows. Section 2 describes the proposed method of identification
of similar problems based on performance statistics, and how to construct a benchmark
characteristic subset. Section 3 describes the best cover construction task which is used for empirical
evaluation of the quality of constructed subsets in Section 4. We conclude in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Benchmark Characteristic Subsets by Clustering</title>
      <p>
        Restricting the evaluation of a solver to a smaller random subset of all benchmark problems
is a common and efective method used by developers. Considering a large set of benchmark
problems, like the SMT-LIB library [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], one can expect a significantly large number of very
similar or even duplicate problems. Random selection does not take problem similarities into
account and thus can lead to unnecessary and duplicate computations. In this section, we
propose a simple method for selecting a characteristic subset of benchmark problems based
on performance features and clustering algorithms. The desired property of this characteristic
subset is that it faithfully characterizes all benchmark problems. That is, that any development,
like parameter tuning or scheduler construction, performed on this subset yields similar results
as the same development performed on all benchmark problems, but faster.
      </p>
      <p>
        Clustering algorithms [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] are capable of dividing a set of entities into disjoint subsets, called
clusters, such that similar entities end up in the same cluster. We propose to apply clustering
algorithms to create a benchmark characteristic subset. Bearing in mind that the benchmark
characteristic subset should allow us to avoid duplicate computations, we cluster the benchmark
problems and we create the characteristic subset by selecting one problem from each cluster.
      </p>
      <p>
        To employ clustering algorithms, the entities, in our case benchmark problems, need to be
represented by numeric feature vectors. We could create the feature vectors from the syntactic
structure of problems, similarly to ENIGMA [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ] clause and problem features for
firstorder logic, but in this work we propose an alternative to compute the feature vectors from
runtime statistics of short probing solver runs. In particular, we run CVC4 solver on all SMT-LIB
benchmark problems with small abstract resource limit1. Independently on whether the problem
is solved or not, we collect the runtime statistics, and we extract selected statistic values (see
Appendix A) to construct the performance feature vectors of the problem. We use only statistic
values which always lead to the same value with repeated runs, in particular, we do not use any
real-time measurements. Moreover, we repeat this process with diferent CVC4 strategies to
obtain longer and more descriptive feature vectors. Our performance features are similar to
dynamic features found in research literature [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
      </p>
      <p>1We run CVC4 with argument –rlimit=10000 to limit the resources (roughly the number of SAT conflicts).</p>
      <p>
        Once the performance feature vectors for all benchmark problems are constructed, we employ
the -means [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] clustering algorithm, one of the most common clustering methods. The
means algorithm divides benchmark problems into  disjoint clusters, hence we set  to the
desired size of the benchmark characteristic subset. Briefly, the -means algorithm works as
follows. Firstly,  points in the vector space called cluster centroids are selected. Diferent
variants of the -means algorithm use diferent centroid initialization methods, with random
initialization being one of the common variants. In the next step, all distances between centroids
and feature vectors are computed, and each feature vector is assigned to the closest cluster
centroid. After that, the average vector of the vectors assigned to the same cluster is computed
for each cluster, and the cluster centroid is moved to the computed average. This process is
iterated until the centroids stop moving.
      </p>
      <p>The above gives us the following method for the construction of the benchmark characteristic
set. We construct the performance feature vectors for all benchmark problems by extracting
runtime statistics from short probing solver runs. As diferent features might have values of
diferent units, we normalize the vectors by dividing each feature by the standard deviation
across all feature values. Then we employ the -means algorithm and construct  problem
clusters and their centroids. From each cluster, we select the problem whose performance feature
vector is the closest to the cluster centroid. Thusly selected problems form the benchmark
characteristic subset of size .</p>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluation Tasks: Greedy and Exact Covers</title>
      <p>To evaluate the quality of the benchmark characteristic subsets from Section 2, we consider the
task of the best cover construction. Let  be the set of benchmark problems and let  be a set
of strategies. Suppose we evaluate all strategies  over problems  . The coverage of  over  ,
denoted coverage(S,P), is the count of problems solved by strategies .</p>
      <p>coverage(S,P) = ⃒⃒⃒ ⋃︁ { ∈  : strategy  solves problem }⃒⃒⃒</p>
      <p>∈
The best cover of strategies  over  of size  is the subset of 0 ⊆  with |0| ≤  and with
the highest possible coverage(0,  ).</p>
      <p>Suppose we construct the best cover 0 ⊆  based only on a partial evaluation of strategies
 over a subset 0 of problems  . This is typically done when the set of strategies and the set of
problems are large. We can then compare the performance of the best cover constructed on 0
with the optimal performance of the best cover constructed with the full evaluation over  . The
performance of the two covers should be similar, if 0 faithfully characterizes  . More formally,
let us fix strategies  and the cover size . Let 0 be the best cover over 0 and let best be the
best cover over  . We will measure the diference between their coverages coverage(0,  ) and
coverage(best,  ) over all problems  . We usually do not have all strategies evaluated over all
problems, but for the sake of the evaluation in this paper we shall compute them.</p>
      <p>We consider two methods to construct the best covers. First, an approximative but fast method
called greedy cover (Section 3.1). Second, an exact but a bit slower best cover construction using
an ILP solver (Section 3.2).
1 Function greedy( , , )</p>
      <p>Input : set of problems  , set of strategies , size</p>
      <p>Output : a subset of  of size  approximating the best cover
2  ← ∅
3 while || &lt;  and ((∃ ∈  )(∃ ∈ ) : strategy s solves problem p) do
4 for  ∈  do // compute the problems solved by each strategy  ∈ 
5  ← {  ∈  : strategy  solves problem }
argmax∈ (||)
 ∖ 
 ∪ {}
// obtain the best strategy on current problems</p>
      <p>// remove the problems solved by  . . .</p>
      <p>Algorithm 1: Greedy cover algorithm.</p>
      <sec id="sec-3-1">
        <title>3.1. Greedy Cover</title>
        <p>The greedy cover algorithm is an approximative method for the construction of the best cover of
strategies  over problems  of size . The algorithm greedy(P,S,n) (see Algorithm 1) proceeds
as follows. It starts with an empty cover  and it finds the strategy  ∈  which solves most
problems from  . The strategy  is added to  and the problems solved by  are removed from
 . This process is iteratively repeated until  reaches the desired size  or no strategy can
solve any of the remaining problems.</p>
        <p>We will measure the relative error of constructing the greedy cover (of size ) by a partial
evaluation of strategies  over a subset 0 of all problems  using the standard formula below.
The experimental evaluation of greedy cover construction on random subsets and on benchmark
characteristic subsets based on performance features is given in Section 4.</p>
        <p>⃒
error(0,n) = 100 · ⃒⃒ 1 −
coverage(greedy(0,,n),P) ⃒</p>
        <p>⃒
coverage(greedy( ,,n),P) ⃒</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Exact Cover</title>
        <p>
          Apart from the approximative greedy cover construction, we also consider an exact best cover
construction method based on the representation of the best cover problem by an Integer Linear
Programming (ILP) formulation. The ILP problems are solved by the Gurobi solver [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>We consider a maximal set cover formulation, i.e., without taking into account the solving
time. This means that given a set of strategies , a set of solved problems  by each strategy
 ∈ , and the size of the cover , the goal is to select a set of strategies ′ ⊆  with || =  that
maximizes the total number of instances solved, i.e., maximizing the cardinality of ⋃︀∈′ .</p>
        <p>We remark that the exact cover maximization is typically only slightly better than the greedy
cover. However, interestingly, Gurobi has proven to be extremely eficient on these problems;
typically we observe solving times around 1 s.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Evaluation</title>
      <p>In this section, we describe the experimental setup in Section 4.1. We experiment with the
best cover construction of random benchmark subsets in Section 4.2. In Section 4.3, we discuss
reasonable restrictions of the evaluation time limit. Mainly, in Section 4.4, we evaluate the
quality of benchmark characteristic subsets constructed using performance features and the
-means clustering method introduced in Section 2.</p>
      <sec id="sec-4-1">
        <title>4.1. SMT-LIB Experiments Setup</title>
        <p>
          For our experiments, we consider the quantified fragment of the SMT-LIB library [
          <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
          ]. This
gives us altogether 75814 problems from 21 diferent logics. We consider 23 diferent CVC4 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
strategies (see Appendix B) and Z3 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] in the default mode as another strategy. We evaluate all
strategies over all problems with a 60 seconds time limit. Any solver run terminating before the
time limit with the result sat or unsat is considered successful and the problem is considered
solved by the respective strategy. Computing this database of results took around 6 days of real
time.2 Once we have this large database of results, we can easily compute partial results on
problem subsets or with a smaller time limit.
        </p>
        <p>
          Figure 1 graphically depicts benchmark problem properties. The bar graph on the left shows
the count of problems from diferent logics. The plot on the right visualizes the performance
feature vectors of the problems described in Section 2. The plot is produced by dimensionality
reduction to 2 dimensions by the t-SNE [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] algorithm. Each problem is represented by one
point colored with the color of its logic corresponding to the left bar graph. Problems with
similar feature vectors should appear close to each other in the plot. Here we just note that
2All the experiments are done on a server with 28 hyperthreading Intel Xeon CPU @ 2.3GHz cores and 256 GB
of memory.
nearby points tend to have the same color. That is, the problems from one logic have similar
performance feature vectors, which is to be expected.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Random Benchmark Subsets</title>
        <p>In this section, we present an experiment on the best cover construction on randomly selected
subsets of benchmark problems. From all 75814 benchmark problems, we construct random
subsets of diferent sizes, ranging from 100 up to 75800 with step 100. Moreover, we construct
10 diferent collections of these subsets to measure the efect of random selection. On every
benchmark subset, we construct the greedy and exact covers of the 14 diferent strategies, called
validation strategies (see Section 4.1). We construct covers of diferent sizes (  ∈ {1, 2, . . . , 10}).
We evaluate the constructed covers on all benchmark problems and we measure the error as
described in Section 3.1 and Section 3.2.</p>
        <p>Figure 2 presents the error for the greedy cover (left) and exact cover (right) construction.
For every random subset size, we plot the worst case error (red solid line), and the average error
(dotted line). Given 10 random subset collections and 10 diferent cover sizes, each value is
computed from 100 relative error values.</p>
        <p>We can see that with benchmark subsets bigger than 1000 problems (roughly 1.3% of
benchmark problems), we obtain less than 2% error even in the worst case. With subsets bigger than
10000, the worst case error drops below 0.5%. From the relatively big diference between the
worst case and the average, we can conclude that the accuracy of the benchmark approximation
by a random subset depends considerably on the coincidence of random selection. In Section 4.4
we show that performance features can help us reduce this dependence by constructing better
benchmark characteristic problems.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Time Limit Evaluation</title>
        <p>In this section, we present basic statistics on the solver runtime. In particular, we are interested
in how many problems are solved with a decreased evaluation time limit. This information can
help developers to set time limits for evaluation experiments.</p>
        <p>For each of the 24 considered strategies, we measure how many problems are solved in
the -th second, that is, the count of problems solved with runtime between  − 1 and .
Figure 3 shows the results for two representative strategies, namely, the strategy that solves the
most problems in the first second (left) and the strategy that solves the most problems with
runtime greater than 1 second. The corresponding strategies (cvc1 and cvc16) can be found in
Appendix B.</p>
        <p>From Figure 3, we can see that the majority of problems are solved within the first second
of runtime, and this is the case with all evaluated strategies. Note the logarithmic -axis in
the figure. We have conducted experiments with greedy cover construction, similar to the
experiments in Section 4.2. The experiments show that the greedy cover constructed with the
evaluation restricted to the time limit of 1 second shows less than 0.25% error, when compared
with greedy covers constructed with the full time limit of 60 seconds. With the time limit of 10
seconds, we reach the error less than 0.15% and with the limit of 30 seconds the error drops
below 0.05%.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Performance Features Benchmark Characteristics</title>
        <p>In this section, we evaluate the quality of benchmark characteristic subsets constructed with
performance features and -means clustering from Section 2. We compare the relative error
of the construction of the best covers on benchmark characteristic subsets and on random
benchmark subsets. Out of the 24 strategies considered in this experiment (see Section 4.1),
we use 10 CVC4 strategies to construct the performance features of benchmark problems. We
use only the remaining 14 strategies to construct the best covers to provide an independent
validation set of strategies.</p>
        <p>The performance features are computed by short CVC4 runs with limited resources as
described in Section 2. Computing the performance features took less than 2 hours. We
construct benchmark characteristic subsets3 of diferent sizes corresponding to the sizes of
3We use scipy.org’s implementation of -means, namely function scipy.cluster.vq.kmeans2 with
parameter minit set to points.
random benchmark subsets from Section 4.2. In this case, however, we do not construct 10
diferent collections but only one subset for each size. On the benchmark characteristic subsets,
we compute both greedy and exact covers of diferent sizes (  ∈ {1, 2, . . . , 10}), and we measure
the relative error as in Section 4.2.</p>
        <p>Figure 4 presents the worst case relative errors for greedy cover (left) and exact cover (right)
constructions. The red lines correspond to the relative error on random benchmark subsets, that
is, they correspond to the red lines from Figure 2, but this time the results are computed only
on the 14 validation strategies. The blue lines correspond to relative errors on the benchmark
characteristic subsets. We can see that in both cases, the relative errors on the benchmark
characteristic subsets are significantly lower, especially for smaller benchmark subsets. Even
with the smallest subset of size 100, we get almost 2% worst case error. Moreover, from size 600
we obtain the worst case error below 1%. Furthermore, the error on benchmark characteristic
subsets approaches the average error on random subsets (see the dotted lines in Figure 2). This
suggests that the construction of benchmark characteristic subsets is less coincidental than
random selection.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>We propose and evaluate a method of constructing a benchmark characteristic subset that
represents the whole benchmark. We provide empirical evidence that our method, based on
clustering of problem performance feature vectors, gives better results than a random benchmark
sampling. To evaluate our method, we use the task of best cover construction. However, we
believe that our method shall prove useful for many other tasks. Furthermore, we provide an
experimental evaluation of how restricting the benchmark size and evaluation time limits afect
the solver performance. The characteristic subsets of quantified problems from SMT-LIB library
computed using the -means clustering method are available for download.
Hence SMT users can directly benefit from the results published in this paper by speeding up
their solver evaluation.</p>
      <p>As future work, we would like to experiment with diferent methods of measuring the quality
of benchmark characteristic subsets, other than the best cover construction. Moreover, we
would like to employ diferent clustering algorithms than -means. Finally, we would like to
propose diferent methods of constructing problem characteristic feature vectors, for example,
extracting features directly from the problem syntax.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The results were supported by the Ministry of Education, Youth and Sports within the dedicated
program ERC CZ under the project POSTMAN no. LL1902, and by the ERC Project SMART
Starting Grant no. 714034. This scientific article is part of the RICAIP project that has received
funding from the European Union’s Horizon 2020 research and innovation programme under
grant agreement no. 857306.</p>
    </sec>
    <sec id="sec-7">
      <title>A. CVC4 Statistics Used as Performance Features</title>
      <p>We use the following statistic keys, obtained by running CVC4 with option -stats, to construct
problem performance feature vectors.
sat::conflicts
sat::decisions
Instantiate::Instantiations_Total
SharedTermsDatabase::termsCount
resource::PreprocessStep
resource::RewriteStep
resource::resourceUnitsUsed</p>
    </sec>
    <sec id="sec-8">
      <title>B. CVC4 Strategies Used in Experiments</title>
      <p>The following 23 CVC4 strategies, described as CVC4 command line arguments, are used in the
experiments. The 24th strategy is Z3 in the default mode. The first ten {cvc1, . . . , cvc10} are
used to construct the performance feature vectors of problems. The remaining 14 are used as
validation strategies.
–simplification=none –full-saturate-quant
–no-e-matching –full-saturate-quant
–relevant-triggers –full-saturate-quant
–trigger-sel=max –full-saturate-quant
–multi-trigger-when-single –full-saturate-quant
–multi-trigger-when-single –multi-trigger-priority –full-saturate-quant
–multi-trigger-cache –full-saturate-quant
–no-multi-trigger-linear –full-saturate-quant
–pre-skolem-quant –full-saturate-quant
–inst-when=full –full-saturate-quant
–no-e-matching –no-quant-cf –full-saturate-quant
–full-saturate-quant –quant-ind
–decision=internal –simplification=none –no-inst-no-entail \\
–no-quant-cf –full-saturate-quant
–decision=internal –full-saturate-quant
–term-db-mode=relevant –full-saturate-quant
–fs-interleave –full-saturate-quant
–finite-model-find –mbqi=none
–finite-model-find –decision=internal
–finite-model-find –macros-quant –macros-quant-mode=all
–finite-model-find –uf-ss=no-minimal
–finite-model-find –fmf-inst-engine
–finite-model-find –decision=internal
–full-saturate-quant</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Conway</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Deters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hadarean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jovanovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Reynolds</surname>
          </string-name>
          , C. Tinelli, CVC4, in: CAV, volume
          <volume>6806</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2011</year>
          , pp.
          <fpage>171</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>L. M. de Moura</surname>
          </string-name>
          , N. Bjørner,
          <article-title>Z3: an eficient SMT solver</article-title>
          ,
          <source>in: TACAS</source>
          , volume
          <volume>4963</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2008</year>
          , pp.
          <fpage>337</fpage>
          -
          <lpage>340</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fontaine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tinelli</surname>
          </string-name>
          ,
          <article-title>The Satisfiability Modulo Theories Library (SMT-LIB), www</article-title>
          .
          <source>SMT-LIB.org</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stump</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tinelli</surname>
          </string-name>
          ,
          <source>The SMT-LIB Standard: Version 2</source>
          .0, in: A.
          <string-name>
            <surname>Gupta</surname>
          </string-name>
          , D. Kroening (Eds.),
          <source>Proceedings of the 8th International Workshop on Satisfiability Modulo Theories (Edinburgh, UK)</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Omran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Engelbrecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salman</surname>
          </string-name>
          ,
          <article-title>An overview of clustering methods</article-title>
          ,
          <source>Intell. Data Anal</source>
          .
          <volume>11</volume>
          (
          <year>2007</year>
          )
          <fpage>583</fpage>
          -
          <lpage>605</lpage>
          . doi:
          <volume>10</volume>
          .3233/IDA-2007-11602.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jakubův</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chvalovský</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Olšák</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Piotrowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Suda</surname>
          </string-name>
          , J. Urban, ENIGMA anonymous:
          <article-title>Symbol-independent inference guiding machine (system description)</article-title>
          ,
          <source>in: IJCAR (2)</source>
          , volume
          <volume>12167</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2020</year>
          , pp.
          <fpage>448</fpage>
          -
          <lpage>463</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jakubův</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Urban</surname>
          </string-name>
          ,
          <article-title>Enhancing ENIGMA given clause guidance</article-title>
          , in: CICM, volume
          <volume>11006</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2018</year>
          , pp.
          <fpage>118</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jakubův</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Urban,</surname>
          </string-name>
          <article-title>ENIGMA: eficient learning-based inference guiding machine</article-title>
          , in: CICM, volume
          <volume>10383</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2017</year>
          , pp.
          <fpage>292</fpage>
          -
          <lpage>302</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Bridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Holden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Paulson</surname>
          </string-name>
          ,
          <article-title>Machine learning for first-order theorem proving - learning to select a good heuristic</article-title>
          ,
          <source>J. Autom. Reason</source>
          .
          <volume>53</volume>
          (
          <year>2014</year>
          )
          <fpage>141</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rawson</surname>
          </string-name>
          , G. Reger,
          <article-title>Dynamic strategy priority: Empower the strong and abandon the weak</article-title>
          ,
          <source>in: PAAR@FLoC</source>
          , volume
          <volume>2162</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>71</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Lloyd</surname>
          </string-name>
          ,
          <article-title>Least squares quantization in PCM</article-title>
          ,
          <source>IEEE Trans. Inf. Theory</source>
          <volume>28</volume>
          (
          <year>1982</year>
          )
          <fpage>129</fpage>
          -
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gurobi</surname>
          </string-name>
          <string-name>
            <surname>Optimization</surname>
          </string-name>
          ,
          <source>Gurobi optimizer reference manual</source>
          ,
          <year>2021</year>
          . URL: http://www.gurobi. com.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          , S. T. Roweis,
          <article-title>Stochastic neighbor embedding</article-title>
          , in: NIPS, MIT Press,
          <year>2002</year>
          , pp.
          <fpage>833</fpage>
          -
          <lpage>840</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>