<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Scrambling and Descrambling SMT-LIB Benchmarks</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Uppsala University</institution>
          ,
          <addr-line>Uppsala</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Each year, the benchmarks for the SMT Competition are drawn from a known pool of benchmarks, the SMT Library. Competing solvers, rather than determine benchmark satis ability from scratch, could thus cheat by simply looking up the correct answer for each benchmark in the library. To make this form of cheating more di cult, benchmarks in the SMT Competition are scrambled. We demonstrate that the current scrambling algorithm, which has been in use since 2011, is ine ective at obscuring the original benchmark. We propose an improved scrambling algorithm, and show that the problem of identifying the original benchmark under this improved algorithm is GI-complete.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The International Satis ability Modulo Theories Competition (SMT-COMP) is an annual
competition between SMT solvers [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The rst competition was held in 2005 as a satellite event
of the International Conference on Computer-Aided Veri cation (CAV). In 2015, the 10th
installment of SMT-COMP was part of the SMT Workshop, again a liated with CAV. (In 2013,
an evaluation was conducted instead of a competition [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].) The SMT-COMP series has been
instrumental in encouraging adoption of the common SMT-LIB input format [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which is now
the de facto standard for the input language of SMT solvers, and has spurred substantial
advances in solver technology. In this regard, it is comparable to other competitions in automated
theorem proving, such as the CADE ATP System Competition (CASC) [
        <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
        ] and the SAT
Competition [
        <xref ref-type="bibr" rid="ref12 ref14">12, 14</xref>
        ].
      </p>
      <p>The SMT Competition exercises solvers in several tracks and logic divisions. The precise
number of tracks and divisions has changed over the years. In this paper, we focus on the
competition's Main Track, where solvers are applied to individual benchmarks and expected to
produce a single answer for each benchmark, either satis able (sat) or unsatis able (unsat).</p>
      <p>
        Benchmarks for the Main Track (and indeed for all competition tracks) are drawn from the
SMT Library [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], a large library of input problems written in the SMT-LIB language. New
benchmarks are regularly added to the SMT Library; the number of benchmarks used in the
competition has grown from a modest 1,360 in 2005 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to 154,238 in 2015 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Benchmarks
may be submitted by anyone, including solver developers. To reduce the bias that this might
otherwise introduce into competition results, it is a staple rule of SMT-COMP to make all
eligible benchmarks publicly available some time before the competition starts. Solver
developers are explicitly encouraged to scrutinize the benchmarks, and to test and tune their solvers
accordingly. Benchmarks whose status (i.e., sat or unsat) is unknown are not eligible for the
competition.
      </p>
      <p>The fact that benchmarks are known in advance has potential for abuse. One could create
a \solver" that uses its input benchmark as a search key, and simply looks up and reports the
benchmark's known status in the SMT Library. Such a solver should perform very well on
all benchmarks in the library|much better than genuine solvers, which employ some (semi-)
decision procedure for the benchmark's logic to determine satis ability from scratch|but it
could not solve any new benchmarks, and would thus be all but useless for practical applications.
( s e t l o g i c UFNIA )
( s e t i n f o : s t a t u s u n s a t )
( declare fun f ( I n t I n t ) I n t )
( declare fun x ( ) I n t )
( a s s e r t ( f o r a l l ( ( y I n t ) ) (&lt; ( f y y ) y ) ) )
( a s s e r t (&gt; x 0 ) )
( a s s e r t (&gt; ( f x x ) ( 2 x ) ) )
( check sat )
( e x i t )
( s e t l o g i c UFNIA )
( declare fun x2 ( ) I n t )
( declare fun x1 ( I n t I n t ) I n t )
( a s s e r t (&lt; ( x2 2) ( x1 x2 x2 ) ) )
( a s s e r t (&gt; x2 0 ) )
( a s s e r t ( f o r a l l ( ( x3 I n t ) ) (&gt; x3 ( x1 x3 x3 ) ) ) )
( check sat )
( e x i t )</p>
      <p>
        The rules of the SMT Competition explicitly disallow such solvers: \[S]olvers must not rely on
previously determined identifying syntactic characteristics of competition benchmarks in testing
satis ability. Violation of this rule is considered cheating." [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
      </p>
      <p>
        To make this form of cheating more di cult, benchmarks in the competition are lightly
scrambled before they are presented to solvers. Figure 1 shows an original benchmark on the left,
and a scrambled version of the benchmark on the right. Note that benchmark scrambling has
di erent goals than benchmark pre-processing [
        <xref ref-type="bibr" rid="ref4 ref9">4, 9</xref>
        ]: while both must preserve (un)satis ability,
pre-processing usually aims to simplify the benchmark in a way that reduces solving time.
Scrambling, on the other hand, should ideally have little to no impact on solving time, so that
is does not distort the competition results. We describe the benchmark scrambler that has been
used at recent SMT Competitions in more detail in Section 2.
      </p>
      <p>Scrambling prevents solvers from directly looking up their (scrambled) input benchmark in
the SMT Library. However, depending on the scrambling algorithm used, it may still be possible
to identify the original benchmark. In Section 3, we show that for the current scrambling
algorithm, this can in fact be done with very little computational e ort. The current scrambler is
therefore ine ective at obscuring the original benchmark. We exploit this weakness in Section 4
by implementing a cheating SMT solver that outperforms the best solvers in the 2015 SMT
Competition by two orders of magnitude. Moreover, in Section 5 we identify all benchmarks in
the SMT Library that are scrambled versions of each other.</p>
      <p>
        Motivated by this shortcoming of the current scrambling algorithm, we identify general
requirements on scrambling algorithms in Section 6, and present an improved algorithm that
makes it harder to identify the original benchmark. How much harder exactly? We show that
the problem of identifying the original benchmark under the improved algorithm is as hard as
the graph isomorphism problem (GI) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], i.e., GI-complete. We evaluate an implementation
of the improved algorithm on the entire SMT Library, and nd that it is su ciently e cient to
be used in future SMT Competitions. Section 7 concludes.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Benchmark Scrambling</title>
      <p>
        The current benchmark scrambler [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] was originally developed by Alberto Griggio. It has (with
minor modi cations) been used at every SMT Competition since 2011. To our knowledge, the
high-level ideas behind its implementation have not been described before.
      </p>
      <p>
        The scrambler is written in C++. It uses a Flex/Bison parser to transform SMT-LIB
benchmarks into an abstract syntax tree. The tree is then printed again in concrete SMT-LIB
syntax. Readers unfamiliar with the SMT-LIB language are referred to [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for a comprehensive
and authoritative description.
      </p>
      <p>The following modi cations are performed by the scrambler during parsing or printing:
2. Input names, in the order in which they are encountered during parsing, are replaced by
names of the form x1, x2, . . . .
3. Variables bound by the same binder ( let , forall , or exists ) are shu ed.
4. Arguments to commutative operators (e.g., and, distinct , +) are shu ed.
5. Anti-symmetric operators (e.g., &lt;, bvslt ) are randomly replaced by their counterparts
(e.g., &gt;, bvsgt), with the arguments suitably swapped.
6. Consecutive declarations (i.e., declare fun commands) are shu ed.</p>
      <p>7. Consecutive assertions (i.e., assert commands) are shu ed.</p>
      <p>Steps 4 and 5 are performed only if the resulting term is syntactically valid according to the
benchmark's logic. Speci cally, in di erence logics, which impose substantial restrictions on
permitted terms, some operators are exempt from these two transformations.</p>
      <p>All shu ing and swapping, i.e., steps 3{7 above, is e ected by pseudo-random choices. For
instance, each occurrence of an anti-symmetric operator is ipped with probability
approximately 0.5. Therefore, each benchmark can typically be scrambled in many di erent ways.
The scrambler's actual output is determined by the seed value that is used to initialize the
pseudo-random number generator.1</p>
      <p>The scrambler supports further options, e.g., to deal with term annotations [3, x3.6]. These
are not relevant for the Main Track of the competition.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Benchmark Normalization</title>
      <p>Benchmark scrambling, as described in Section 2, is evidently a non-injective transformation.
For instance, two benchmarks that di er only in comments or names will generate the same
scrambled output (for any given seed). Therefore, it is not possible in general to undo the
scrambling, i.e., to compute the original benchmark from the scrambler's output alone.</p>
      <p>In the context of the SMT Competition, however, additional information is available: we can
access the SMT Library, i.e., the collection of original benchmarks. This may still be insu cient
to identify the original benchmark, as there could be several benchmarks in the SMT Library
that would result in the same output when scrambled. (We quantify this phenomenon in
Section 5.) But note that scrambling does not change the benchmark status: the scrambled
output is (un)satis able if and only if the original benchmark is (un)satis able. Therefore,
all possible original benchmarks are equisatis able, and for the purpose of cheating in the
competition it is su cient to identify any one of them.</p>
      <p>To this end, we have implemented a benchmark normalizer.2 The normalizer is applied to
both original and scrambled benchmarks. It computes normal forms for these, such that the
diagram in Figure 2 commutes.</p>
      <p>We obtain the benchmark normalizer as a modi cation of the scrambler, by replacing all
pseudo-random transformations (steps 3{7 in Section 2) with deterministic transformations that
generate the same output regardless of the order of input arguments. In other words, where
1This seed value is computed after the SMT Competition's solver submission deadline each year, according
to rules designed to make its value di cult to predict both for participants and competition organizers.</p>
      <p>2Our implementations, also of the SMT solver described in Section 4 and of the improved scrambler described
in Section 6, are available at http://user.it.uu.se/~tjawe125/publications/weber16scrambling.html. The
evaluation data reported on in Sections 4-6 of this paper are available at the same URL.</p>
      <sec id="sec-3-1">
        <title>Original benchmark</title>
      </sec>
      <sec id="sec-3-2">
        <title>Scrambling</title>
        <p>/</p>
      </sec>
      <sec id="sec-3-3">
        <title>Scrambled benchmark</title>
      </sec>
      <sec id="sec-3-4">
        <title>Normalization</title>
        <p>+
r</p>
      </sec>
      <sec id="sec-3-5">
        <title>Normalization'</title>
      </sec>
      <sec id="sec-3-6">
        <title>Normalized benchmark</title>
        <p>the scrambler shu es, the normalizer sorts. Speci cally, the normalizer performs the following
operations:
1. Comments, set info commands and other artifacts that have no logical e ect (e.g.,
super uous white-space, redundant parentheses) are removed.
2. When the normalizer is applied to an original benchmark, input names, in the order in
which they are encountered during parsing, are replaced by names of the form x1, x2, . . . .</p>
        <p>When the normalizer is applied to a scrambled benchmark, input names are retained.
3. Variables bound by the same binder ( let , forall , or exists ) are sorted.
4. Arguments to commutative operators (e.g., and, distinct , +) are sorted.
5. For each pair of anti-symmetric operators (e.g., &lt; and &gt;), we have chosen a canonical
representation. All occurrences of the non-canonical operator are eliminated.
6. Consecutive declarations (i.e., declare fun commands) are sorted.
7. Consecutive assertions (i.e., assert commands) are sorted.</p>
        <p>Sorting is performed with respect to a xed total order on nodes of the abstract syntax
tree. We sort nodes according to the symbol that they contain, and nodes with equal symbols
according to a lexicographic ordering of their children. Any other total order could be used
instead.</p>
        <p>Note that the normalizer (just like the scrambler) applies many of these operations already
during parsing, in a bottom-up fashion. Consequently, subterms are already in normal form
when they are considered as constituents of larger terms. Therefore, ipping anti-symmetric
operators (step 5) does not interfere with the sorting of arguments to commutative operators
(step 4), regardless of the total order used.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The World's Fastest SMT Solver</title>
      <p>To evaluate the performance of benchmark normalization and its feasibility in the context of
the SMT Competition, we have implemented a cheating SMT solver. First, we normalized all
154,238 benchmarks in the SMT Library that were used in the Main Track of the 2015 SMT
Competition. For each normal form we computed its SHA-512 hash digest (there were no hash
collisions), and used this to prime our solver with a map from digests to benchmark status.
Note that this computation can be done o ine, i.e., before the start of the competition.</p>
      <p>
        We then applied our solver to all Main Track benchmarks, using the same StarExec [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
execution environment that was used for the 2015 SMT Competition. As in the competition,
evaluation machines were equipped with Intel Xeon CPUs (E5-2609, 10 MB cache, 2.4 GHz),
(a) Run-time comparison for each benchmark.
(b) Run-times plotted against the number of
benchmarks solved.
memory was capped at 61,440 MB, and a time limit of 2,400 seconds per benchmark was
imposed. Benchmarks were scrambled with the competition scrambler, using the same seed
value (53689572) that was used in the competition, before they were presented to our solver.
The choice of seed value is largely irrelevant. The seed is only passed to the competition
scrambler|but not to our solver|and any other value should lead to similar evaluation results.
      </p>
      <p>Our solver normalizes each scrambled benchmark (retaining input names), computes the
SHA-512 digest of the normal form, and uses this to look up the benchmark's status in the
pre-computed map.</p>
      <p>Figure 3 shows the solver's performance, and compares it to that of a virtual best solver,
which is obtained by using, for each benchmark, the best performance of any solver that
participated in the 2015 SMT Competition. While the virtual best solver outperforms our cheating
solver on some (mostly very easy) benchmarks, our solver gets ahead of the virtual best solver
after 0.02 seconds. It returns a correct result for every competition benchmark; in fact, it
takes at most 37 seconds to solve each benchmark. In contrast, the virtual best solver times
out on 482 benchmarks. The total run-time of our solver on all competition benchmarks is
7,729 seconds|approximately a 223-fold speedup over the virtual best solver, which ran for
1,721,874 seconds total (including timeouts).</p>
      <p>We conclude that the scrambling algorithm that has been in use at SMT Competitions since
2011 does not create signi cant computational obstacles to identifying the original benchmark.
Even for the largest benchmarks used in the competition, normalization is feasible, and is
usually orders of magnitude faster than solving the benchmark from scratch.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Benchmark Similarities in the SMT Library</title>
      <p>All Main Track benchmarks contain exactly one check sat command. Such benchmarks are
called non-incremental. The SMT Library currently contains 196,375 non-incremental
benchmarks. (Of these, 30,717 were not eligible for the SMT Competition in 2015 because their
status was unknown, or because of syntactic restrictions. We include these benchmarks in our
10
100</p>
      <p>1000
Size (benchmarks)
10</p>
      <p>100
Size (benchmarks)
1000
(a) Identical benchmarks in SMT-LIB.
(b) Similar benchmarks in SMT-LIB.
analysis in this section.)</p>
      <p>By computing and comparing the SHA-512 digests of all non-incremental benchmarks, we
nd that 21,084 of these benchmarks are duplicates. Some benchmarks have been copied many
times; the front-runner appears 885 times under di erent benchmark names. Figure 4a shows
in more detail how the SMT Library can be partitioned into equivalence classes of syntactically
equal benchmarks, according to the number of times that each benchmark appears in the library.</p>
      <p>The normalizer that was presented in Section 3 allows us to repeat this investigation with
a coarser equivalence relation in place of syntactic identity: namely, we call two benchmarks
similar if they have the same normal form. Under this equivalence relation, which would
consider all scrambled versions of a benchmark similar to the original, we nd that 30,799
non-incremental benchmarks are duplicates. There are up to 1,499 similar versions of a single
benchmark. Figure 4b shows the partition of the SMT Library into equivalence classes for this
notion of similarity.</p>
      <p>Interestingly, there are 119 non-incremental benchmarks in the SMT Library whose status
is currently unknown, but that are similar to benchmarks with known status (and thus must
be equisatis able). This suggests that benchmark normalization may have some legitimate use
in SMT solving after all.</p>
      <p>We suggest that duplicate benchmarks be removed from the SMT Library. To avoid giving
undue emphasis to speci c benchmarks, usually only one version of similar benchmarks should
be retained. Moreover, we suggest that new benchmarks should be added to the library only
after their dissimilarity from existing benchmarks has been established.
6</p>
    </sec>
    <sec id="sec-6">
      <title>A GI-Complete Scrambling Algorithm</title>
      <p>
        Is there a better scrambling algorithm, which is not as easy to sidestep? Before we can answer
this question, we identify the following requirements on a good scrambling algorithm.
1. The algorithm must not a ect satis ability. Rationale: This fundamental correctness
property ensures that the status of the scrambled benchmark is known. It is also necessary
to argue that the scrambled output is a variation of the original problem.
2. The algorithm must be e cient. Rationale: Scrambling must not become a bottleneck in
the SMT Competition. The largest benchmarks in the SMT Library have a size of several
gigabytes. We found that even algorithms whose complexity was quadratic in the size of
the benchmark were too slow to be useful in practice.
3. Ideally, the algorithm should not a ect solving times. Rationale: Any impact on
solving times would potentially distort the competition results. Unfortunately, this property
cannot be fully guaranteed in practice|SMT solvers use complex heuristics, where even
seemingly innocuous changes like renaming of variables may have tremendous impact [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
However, the algorithm should certainly not alter the benchmark size or structure in any
signi cant way.
4. Given two benchmarks, it should be hard to decide without additional information (such as
the seed used for scrambling) whether one is a scrambled version of the other. Rationale:
This property ensures that scrambling makes it di cult to cheat in the SMT
Competition, i.e., that given a scrambled benchmark, one cannot easily identify the corresponding
original benchmark in the SMT Library.
      </p>
      <p>A good scrambling algorithm, therefore, is akin to a one-way function (i.e., a function that is
easy to compute on every input, but hard to invert given the image of a random input) that
preserves benchmark satis ability, size and structure.</p>
      <p>
        The existence of one-way functions is a famous open problem [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The scrambling algorithm
that has been in use at SMT Competitions since 2011 meets the rst three requirements above,
but|as our evaluation in Section 4 demonstrated|falls short of the fourth.
      </p>
      <p>To improve on this algorithm, we observe that the normalization algorithm presented in
Section 3 crucially relies on the fact that the replacement of input names with names of the
form x1, x2, . . . is entirely predictable, as it only depends on the order in which input names
appear in the original benchmark. We propose to amend the scrambling algorithm that was
described in Section 2 with an additional step as follows:
2a. A (pseudo-)random permutation is applied to all names x1, x2, . . . that occur in the
benchmark, replacing each name xi with (xi).</p>
      <p>With this addition, sorting a scrambled benchmark with respect to a xed term order is no
longer su cient to normalize the benchmark. The normalizer would also have to replace
scrambled names xi with their original names 1(xi). But (and the seed from which was derived)
is not known to the normalizer, and it cannot be reconstructed easily, since the other steps of
the scrambling algorithm obfuscate the connection between original and scrambled names. In
fact, we can prove the following theorem.</p>
      <p>Theorem 1. For the scrambling algorithm from Section 2, amended with a name-permuting
step as detailed above, the problem of determining whether (there exists a seed such that) two
benchmarks are scrambled versions of each other is GI-complete.</p>
      <p>
        Recall that GI is the class of problems that have a polynomial-time Turing reduction to the
graph isomorphism problem [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], i.e., the problem of determining whether two nite graphs are
isomorphic. This problem is of great interest in complexity theory: it lies in NP, but is not
known to be NP-complete, nor is it known to be solvable in polynomial time. The following
proof of Theorem 1 incorporates ideas by Tsuyoshi Ito [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>Proof. To reduce the problem of determining whether two benchmarks are scrambled versions
of each other to graph isomorphism, consider an SMT-LIB benchmark S. Without loss of
generality we may assume that S consists of a single block of declarations followed by a single
block of assertions. (Otherwise, we apply the following construction separately to each block.)
As in step 5 of the normalization algorithm (Section 3), replace anti-symmetric operators in S
with their canonical representation. Then construct a vertex-labeled graph G(S) as follows.
1. For each input name x that occurs in S, the graph has a vertex vx labeled as \name."
2. For each assertion i in S, the graph contains a syntax tree for i. Arguments to
noncommutative operators in the syntax tree are suitably labeled (e.g., with non-negative
integers) to indicate their order. For commutative operators, no such labeling is present.
3. Syntax trees do not contain input names. Instead, for each occurrence of an input name x
in i, there is an edge connecting the appropriate parent in the syntax tree to vx.</p>
      <p>
        Two benchmarks S and T are scrambled versions of each other if and only if they can
be made equal by renaming, reordering arguments to commutative operators, and reordering
assertions. It is not di cult to see that this is the case if and only if there is an isomorphism
between G(S) and G(T ) that preserves labels. But the isomorphism problem of vertex-labeled
graphs is known to be equivalent to the isomorphism problem of unlabeled graphs [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. This
proves that the above problem is in GI.
      </p>
      <p>To reduce graph isomorphism to the problem of determining whether two benchmarks are
scrambled versions of each other, given a graph G = (V; E), construct an SMT-LIB benchmark
B(G) as follows. For each vertex v 2 V , B(G) contains a declaration (declare fun v () Bool).
For each edge e = fv1; v2g 2 E, B(G) contains an assertion (assert (= v1 v2 )). Then again it
is not di cult to see that two graphs G and H are isomorphic if and only if B(G) and B(H)
are scrambled versions of each other. Therefore, the above problem is also GI-hard.</p>
      <p>We have implemented this addition to the scrambling algorithm, and evaluated the amended
algorithm by scrambling all 196,375 non-incremental benchmarks in the SMT Library (using
the StarExec execution environment and seed value described in Section 4). Across the entire
SMT Library, the (unoptimized) amended implementation incurs 28% overhead when compared
to the current scrambler. The maximal scrambling time for a single benchmark increases from
1,289 seconds to 1,786 seconds. We conclude that the amended algorithm, albeit clearly slower
than the current scrambler, is su ciently e cient to be used in future SMT Competitions.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>We have shown that the scrambling algorithm that has been in use to obscure SMT-LIB
benchmarks in SMT Competitions since 2011 is ine ective. A cheating solver that identi es the
original benchmark and looks up the benchmark's correct answer in the SMT Library
outperforms the best solvers in the 2015 SMT Competition by two orders of magnitude. This
also helped us to identify numerous duplicate benchmarks in the SMT Library, including 119
benchmarks whose status was previously unknown.</p>
      <p>Despite the relative technical ease with which the current scrambling algorithm can be
sidestepped, we have no reason to believe that this form of cheating has actually occurred at
past SMT Competitions. Winning solvers receive little beyond recognition by the research
community; consequently, there are strong social disincentives to submitting a cheating solver.
Moreover, most competing solvers|including those that performed best|are open source,
allowing their implementations to be scrutinized.</p>
      <p>We have proposed and implemented an improved scrambling algorithm. Under this
algorithm, identifying the original benchmark is GI-complete. Our implementation of the improved
algorithm is (on average) only 28% slower than the current scrambler on the SMT Library, and
we expect it to be used for the 2016 SMT Competition. Our results might also prompt the
organizers of other competitions that use previously known benchmarks, such as CASC and
SAT, to revisit the issue of benchmark scrambling.</p>
      <p>
        At this point, at least two open questions remain. First, it is unclear whether the improved
scrambling algorithm makes it practically infeasible to recognize the original benchmark.
Although no polynomial-time algorithm for solving GI is known, for many|if not all|existing
benchmarks in the SMT Library this may well be su ciently easy. Recent work by Babai [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
claims to show that GI is quasi-polynomial. Even with the improved scrambling algorithm,
the SMT Competition may thus have to rely on social disincentives and scrutiny more than on
technical measures to prevent this form of cheating.
      </p>
      <p>
        Second, is there a scrambling algorithm that meets the requirements of Section 6 and|for
theoretical or practical reasons|is better than the algorithm proposed here? In the
terminology of [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we have advocated moving from an \identity" scrambling algorithm, which only
shu es declarations, assertions and terms within assertions, to a \permutation" algorithm,
which additionally shu es names. Further transformations are conceivable, e.g.,
complementing (negating) Boolean or numerical symbols. We leave these questions for future work.
8
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>Sylvain Conchon and David Deharbe organized SMT-COMP 2015 together with the author of
this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Laszlo</given-names>
            <surname>Babai</surname>
          </string-name>
          .
          <article-title>Graph isomorphism in quasipolynomial time</article-title>
          .
          <source>CoRR, abs/1512.03547</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Clark</given-names>
            <surname>Barrett</surname>
          </string-name>
          , Morgan Deters, Leonardo Mendonca de Moura,
          <source>Albert Oliveras, and Aaron Stump</source>
          .
          <article-title>6 years of SMT-COMP</article-title>
          .
          <source>Journal of Automated Reasoning</source>
          ,
          <volume>50</volume>
          (
          <issue>3</issue>
          ):
          <volume>243</volume>
          {
          <fpage>277</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Clark</given-names>
            <surname>Barrett</surname>
          </string-name>
          , Pascal Fontaine, and
          <string-name>
            <given-names>Cesare</given-names>
            <surname>Tinelli</surname>
          </string-name>
          .
          <source>The SMT-LIB Standard: Version 2.5. Technical report</source>
          , Department of Computer Science, The University of Iowa,
          <year>2015</year>
          . http://smtlib.cs. uiowa.edu/papers/smt-lib
          <source>-reference-v2</source>
          .
          <fpage>5</fpage>
          -r2015-
          <fpage>06</fpage>
          -28.pdf.
          <source>Retrieved April 13</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Richard</given-names>
            <surname>Bonichon</surname>
          </string-name>
          , David Deharbe,
          <string-name>
            <given-names>Pablo</given-names>
            <surname>Dobal</surname>
          </string-name>
          , and Claudia Tavares.
          <article-title>SMTpp: preprocessors and analyzers for SMT-LIB</article-title>
          .
          <source>In Proceedings of the 13th International Workshop on Satis ability Modulo Theories (SMT</source>
          <year>2015</year>
          ),
          <year>2015</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Franc</given-names>
            <surname>Brglez</surname>
          </string-name>
          , Xiao Yu Li,
          <string-name>
            <given-names>and Matthias F.</given-names>
            <surname>Stallmann</surname>
          </string-name>
          .
          <article-title>On SAT instance classes and a method for reliable performance experiments with SAT solvers</article-title>
          .
          <source>Annals of Mathematics and Arti cial Intelligence</source>
          ,
          <volume>43</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>34</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>David</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Cok</surname>
            , David Deharbe,
            <given-names>and Tjark</given-names>
          </string-name>
          <string-name>
            <surname>Weber</surname>
          </string-name>
          .
          <article-title>The 2014 SMT competition</article-title>
          .
          <source>Journal on Satis ability, Boolean Modeling and Computation</source>
          ,
          <volume>9</volume>
          :
          <fpage>207</fpage>
          {
          <fpage>242</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>David</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Cok</surname>
            , Aaron Stump, and
            <given-names>Tjark</given-names>
          </string-name>
          <string-name>
            <surname>Weber</surname>
          </string-name>
          .
          <article-title>The 2013 evaluation of SMT-COMP and SMTLIB</article-title>
          .
          <source>Journal of Automated Reasoning</source>
          ,
          <volume>55</volume>
          (
          <issue>1</issue>
          ):
          <volume>61</volume>
          {
          <fpage>90</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Sylvain</given-names>
            <surname>Conchon</surname>
          </string-name>
          , David Deharbe,
          <string-name>
            <given-names>and Tjark</given-names>
            <surname>Weber</surname>
          </string-name>
          .
          <article-title>10th International Satis ability Modulo Theories Competition (SMT-COMP</article-title>
          <year>2015</year>
          )
          <article-title>: Rules and procedures</article-title>
          . http://smtcomp.sourceforge. net/
          <year>2015</year>
          /rules15.pdf.
          <source>Retrieved April 13</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>David</given-names>
            <surname>Deharbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Pascal</given-names>
            <surname>Fontaine</surname>
          </string-name>
          , Stephan Merz, and Bruno Woltzenlogel Paleo.
          <article-title>Exploiting symmetry in SMT problems</article-title>
          . In Nikolaj Bj rner and Viorica Sofronie-Stokkermans, editors,
          <source>Automated Deduction - CADE-23 - 23rd International Conference on Automated Deduction, Wroclaw, Poland, July 31 - August 5</source>
          ,
          <year>2011</year>
          . Proceedings, volume
          <volume>6803</volume>
          of Lecture Notes in Computer Science, pages
          <volume>222</volume>
          {
          <fpage>236</fpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Oded</given-names>
            <surname>Goldreich</surname>
          </string-name>
          .
          <source>Foundations of Cryptography: Volume</source>
          <volume>1</volume>
          ,
          <string-name>
            <given-names>Basic</given-names>
            <surname>Tools</surname>
          </string-name>
          . Cambridge University Press,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Alberto</given-names>
            <surname>Griggio</surname>
          </string-name>
          .
          <article-title>SMT-COMP 2011 benchmark scrambler</article-title>
          . http://smtcomp.sourceforge.net/
          <year>2015</year>
          /smtcomp2015_scrambler.tar.gz.
          <source>Retrieved April 13</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Marijn</given-names>
            <surname>Heule</surname>
          </string-name>
          and
          <string-name>
            <given-names>Torsten</given-names>
            <surname>Schaub</surname>
          </string-name>
          .
          <article-title>What's hot in the SAT and ASP competitions</article-title>
          .
          <source>In Blai Bonet and Sven Koenig</source>
          , editors,
          <source>Proceedings of the Twenty-Ninth AAAI Conference on Arti cial Intelligence, January 25-30</source>
          ,
          <year>2015</year>
          , Austin, Texas, USA., pages
          <volume>4322</volume>
          {
          <fpage>4323</fpage>
          . AAAI Press,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Tsuyoshi</given-names>
            <surname>Ito</surname>
          </string-name>
          .
          <article-title>What's the complexity of recognizing equivalence for the following relation? (Answer</article-title>
          ),
          <year>2014</year>
          . http://cstheory.stackexchange.com/questions/27287/
          <article-title>whats-the-complexity-of-recognizing-equivalence-for-the-following-relation</article-title>
          .
          <source>Retrieved April 13</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Matti</surname>
            <given-names>Ja</given-names>
          </string-name>
          rvisalo, Daniel Le Berre, Olivier Roussel, and
          <string-name>
            <given-names>Laurent</given-names>
            <surname>Simon</surname>
          </string-name>
          .
          <article-title>The international SAT solver competitions</article-title>
          .
          <source>AI Magazine</source>
          ,
          <volume>33</volume>
          (
          <issue>1</issue>
          ),
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kobler</surname>
          </string-name>
          ,
          <string-name>
            <surname>U.</surname>
          </string-name>
          <article-title>Schoning, and Jacobo Toran. The Graph Isomorphism Problem: Its Structural Complexity. Progress in Theoretical Computer Science</article-title>
          . Birkhauser, Boston,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>SMT-COMP 2015</surname>
          </string-name>
          <article-title>| benchmarks</article-title>
          . http://smtcomp.sourceforge.net/2015/benchmarks.shtml.
          <source>Retrieved April 13</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>SMT-LIB The</surname>
          </string-name>
          <article-title>Satis ability Modulo Theories Library</article-title>
          . http://smtlib.cs.uiowa.edu/ benchmarks.shtml.
          <source>Retrieved April 13</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Aaron</surname>
            <given-names>Stump</given-names>
          </string-name>
          , Geo Sutcli e, and Cesare Tinelli.
          <article-title>Starexec: A cross-community infrastructure for logic solving</article-title>
          . In Stephane Demri, Deepak Kapur, and Christoph Weidenbach, editors,
          <source>Automated Reasoning - 7th International Joint Conference, IJCAR</source>
          <year>2014</year>
          ,
          <article-title>Held as Part of the Vienna Summer of Logic</article-title>
          ,
          <source>VSL</source>
          <year>2014</year>
          , Vienna, Austria,
          <source>July 19-22</source>
          ,
          <year>2014</year>
          . Proceedings, volume
          <volume>8562</volume>
          of Lecture Notes in Computer Science, pages
          <volume>367</volume>
          {
          <fpage>373</fpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Sutcli e. The 7th IJCAR automated theorem proving system competition - CASC-J7</article-title>
          .
          <source>AI Communications</source>
          ,
          <volume>28</volume>
          (
          <issue>4</issue>
          ):
          <volume>683</volume>
          {
          <fpage>692</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Sutcli e and C. Suttner. The state of CASC</article-title>
          .
          <source>AI Communications</source>
          ,
          <volume>19</volume>
          (
          <issue>1</issue>
          ):
          <volume>35</volume>
          {
          <fpage>48</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Zemlyachenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Korneenko</surname>
          </string-name>
          , and
          <string-name>
            <surname>R. I. Tyshkevich.</surname>
          </string-name>
          <article-title>Graph isomorphism problem</article-title>
          .
          <source>Journal of Soviet Mathematics</source>
          ,
          <volume>29</volume>
          (
          <issue>4</issue>
          ):
          <volume>1426</volume>
          {
          <fpage>1481</fpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>