A Mixed Integer Programming Approach for Verifying Properties of
                                Binarized Neural Networks

                                 Christopher Lazarus∗ and Mykel J. Kochenderfer
                                                Stanford University
                                          {clazarus, mykel}@stanford.edu


                            Abstract                                   form of a statement that if the input belongs to a set X , then
                                                                       the output is in a set Y. In the context of control, this formu-
        Many approaches for verifying input-output prop-               lation can be used to verify that the network satisfies safety
        erties of neural networks have been proposed re-               constraints; in the classification setting, this formulation can
        cently. However, existing algorithms do not scale              be used to certify that points near a training sample are as-
        well to large networks. Recent work in the field               signed the same label as that sample.
        of model compression studied binarized neural net-                Verification algorithms are typically designed to be sound,
        works (BNNs), whose parameters and activations                 which means that they will only conclude a property holds if
        are binary. BNNs tend to exhibit a slight decrease             the property truly holds [Katz et al., 2017; Katz et al., 2019].
        in performance compared to their full-precision                With the aim to improve efficiency, some approaches sac-
        counterparts, but they can be easier to verify. This           rifice completeness, meaning that even if a property holds,
        paper proposes a simple mixed integer program-                 the algorithm might fail to identify it. Incomplete algorithms
        ming formulation for BNN verification that lever-              often rely on over-approximation, allowing them to scale to
        ages network structure. We demonstrate our ap-                 problems involving larger networks, high dimensional input
        proach by verifying properties of BNNs trained on              spaces, or high dimensional output spaces.
        the MNIST dataset and an aircraft collision avoid-
                                                                          Even when restricting the class of networks to those with
        ance controller. We compare the runtime of our
                                                                       Rectified Linear Unit (ReLU) activation functions (or even
        approach against state-of-the-art verification algo-
                                                                       a more general piecewise-linear function) the problem has
        rithms for full-precision neural networks. The re-
                                                                       been shown to be NP-hard [Katz et al., 2017]. The hard-
        sults suggest that the difficulty of training BNNs
                                                                       ness of the verification problem has motivated many differ-
        might be worth the reduction in runtime achieved
                                                                       ent approaches [Liu et al., 2021], including reachability, opti-
        by our verification algorithm.
                                                                       mization and search algorithms. Even incomplete algorithms
                                                                       struggle to verify properties of networks with sizes commonly
1       Introduction                                                   encountered in contemporary applications. In principle, sim-
                                                                       pler models should be easier to verify and binarized neural
                                                                       networks (BNNs) [Hubara et al., 2016] are simpler than tra-
   Neural networks have been shown to be susceptible to ad-            ditional full precision neural networks. Their parameters are
versarial attacks [Papernot et al., 2016] often leading to dras-       binary and their activations are binary.
tically different outputs when slightly perturbing their input            Binarization is an extreme quantization scheme that can
which can be costly or dangerous. Multiple approaches to               significantly reduce memory and computation requirements.
evaluate the robustness of networks to adversarial attacks             However, binarization introduces non-differentiable and even
have been developed. Many of these only provide statistical            non-continuous blocks to the computational graph of a net-
assessments and focus on evaluating the network on a large             work, which complicates the optimization used to train the
but finite collection of points in the input space. However,           network. However, recent work motivated by their applicabil-
given that the input space is, in principle, infinite in cardinal-     ity in highly constrained environments such as edge devices
ity, it is not viable to assess the output for all the points in the   has enabled them to achieve performance comparable to tra-
input space.                                                           ditional full precision networks [Hubara et al., 2016].
   Recently, new approaches have emerged as an alternative                The reduced memory requirement and simplified computa-
to formally certify the input-output properties of neural net-         tion resulting from this representation has a drawback: bina-
works [Liu et al., 2021]. Properties are often specified in the        rized neural networks are harder to train. A major challenge
    ∗
    Contact Author                                                     is back propagating the gradient of the weights through sign
    Copyright c 2021 for this paper by its authors. Use permitted      functions. There are workarounds [Simons and Lee, 2019]
under Creative Commons License Attribution 4.0 International (CC       such as using straight-through estimators (STE) [Hubara et
BY 4.0).                                                               al., 2016].
   This paper presents an approach to verify properties of           where BW and Bz are binarized weights and binarized acti-
BNNs that leverages their structure. The verification problem        vations, with scaling factors α and β used for batch normal-
is formulated as a mixed integer programming (MIP) problem           ization. The sign function is often used to compute QW and
that encodes the input set X by constraining variables asso-         Qz                        
ciated with the input layer of the network and the output set                                     +1, if x ≥ 0
Y by constraining variables associated to the output layer of                       sign(x) =                                 (6)
                                                                                                  −1, otherwise
the network. In our approach, we leverage the binary nature
                                                                       The above representation enables an easy implementation
of both the parameters and the activations of the network.
                                                                     of batch normalization while keeping most parameters and
   Experimentally, we show that our approach can verify
                                                                     operations binary.
properties of BNNs. Section 4 demonstrates the capabili-
                                                                       In this context, the arithmetic operations needed for a for-
ties of our approach by formally assessing the adversarial ro-
                                                                     ward pass of a layer z b in a binarized network F b can be
bustness of BNNs trained on the MNIST [LeCun and Cortes,
                                                                     computed as:
2010] dataset and properties of a collision avoidance system
[Katz et al., 2017]. We compare the runtime of our approach             zib = σ Q(W )zib = fi (zi−1b                   b
                                                                                                                          
                                                                                                       ) = σi Q(W )i zi−1       (7)
to that of the state-of-the-art implementation of a full pre-
cision network verification algorithm for the equivalent full               = σ (αβBW } Bz ) = αβ sign (BW } Bz )               (8)
precision networks. Our proposed approach reduces signifi-           where } denotes the inner product for binary vectors with
cantly verification runtime.                                         bitwise operation XNOR-Bitcount.
                                                                     Linear.
2     Background
                                                                                               ẑi = Qi zi−1                    (9)
2.1    Neural Networks
Consider F a feedforward neural network with n layers with           where Qi ∈ {−1, 1}ki+1 ×k .
input x ∈ Dx ⊆ Rk0 and output y ∈ Dy ⊆ Rkn . Here,                   Batch Normalization.
y = F (x) corresponds to evaluating the network on input                                                 
x and obtaining output y, k0 is the dimensionality of x, and                                     yi − µki
                                                                                   ẑi = αki                + γk              (10)
kn is the dimensionality of y. We assume that all inputs and                                        σki
outputs are flattened to vectors. Each layer in F is a function
                                                                     where y = (y1 , . . . , ynk+1 ) and αk , γk , σki ∈ R.
fi : Rki−1 → Rki , where ki is the dimensionality of the
hidden variable zi in layer i. Accordingly, z0 = x and zn =          Activation Function.
y. The network can be represented by                                                           zi = sign(ẑi )                (11)
                F = fn ◦ fn ◦ fn−1 ◦ · · · ◦ f1               (1)
                                                                     where ẑi ∈ Rki+1 and zi ∈ {−1, 1}ki+1 .
and
                                                                     Some BNN architectures call for binarized inputs as
              zi = fi (zi−1 ) = σi (Wi zi−1 + bi )            (2)
                                                                     well. Our verification approach does not require binarized
where Wi ∈ Rki ×ki−1 is the weight matrix, bi ∈ Rki is the           inputs, but this requirement is easy to incorporate by rep-
bias vector, and σi : Rki → Rki is the activation function.          resenting floating or fixed point inputs as a combination of
   Let zi,j be the value of the jth node in the ith layer, wi,j ∈    binary inputs, either by quantizing or directly using their
R1×ki−1 be the jth row of Wi , and wi,j,k be the kth entry in        binary representation.
wi,j . Given that activation functions are usually component-           Section 3 derives how each type of block can be encoded
wise vector functions we have that                                   with linear constraints, enabling the formulation of a mixed-
                                                        !            integer programming problem for verification purposes. The
                                      X                              last layer of a network can be used in different ways, often
        zi,j = σi,j (wi,j zi−1 ) = σ      wi,j,k zi−1,k        (3)
                                                                     using a softmax or an argmax operator. In either case, we can
                                        k
                                                                     encode desired properties at the output of the layer before
                               = σi,j (ẑi,j )                (4)    such functions, at the so called logits.
where ẑi := Wi zi−1 . We dropped the bias terms in this anal-
                                                                     2.3   Neural Network Verification
ysis for compactness and without loss of generality.
                                                                     The verification problem consists of checking whether input-
2.2    Binarized Neural Networks                                     output relationships of a function hold [Liu et al., 2021]. A
A binarized neural network is a network involving binary             subset of the input space X ⊆ Dx and a subset of the output
weights and activations [Hubara et al., 2016]. The goal of           space Y ⊆ Dy are defined. In its most general form, solv-
network binarizaion is to represent the floating-point weights       ing the verification problem requires certifying whether the
W and the activations zi,j for a given layer using 1 bit. The        following holds:
parameters are represented by:                                                        x ∈ X =⇒ y = F (x) ∈ Y.                 (12)
                                                                     In general, the input and output sets could have any geometry
           Q(W ) = αBW                  Q(z) = βBz            (5)    but are often constrained by the verification algorithm. Our
approach restricts the class of X and Y to closed polytopes,         partially binarized neural networks. A similar approach has
corresponding to the intersection of half-spaces. These re-          been demonstrated to work well for DNNs [Tjeng et al.,
gions can be encoded as a conjunction of linear constraints,         2019] and MIP has been also used for reachability analysis
which is required for our mixed integer programming formu-           [Lomuscio and Maganti, 2017]. Our main contribution is to
lation.                                                              apply MIP to the verification of BNNs.
Applications Given that neural networks provide the state-
of-the-art performance for many tasks, such as perception and        3   Verification of BNNs using Mixed Integer
control [Katz et al., 2017] tasks, studying their robustness has         Programing
attracted significant attention [Papernot et al., 2016]. In the      Binarized neural networks are composed of piecewise-linear
context of image classification, a network F assigns a value         layers that may be fully connected, convolutional, or average-
to each of the possible labels in its training set and the max-      pooling that may have piecewise-linear activation functions,
imum value arg maxi yi is often used to impute the label of          such as ReLU, max and sign. Other commonly used ele-
an input x. Consider an input x0 with label i∗ ∈ {1, . . . , ln }.   ments such as batch normalization, shortcut connections, and
It would be desirable that yi∗ > yk for all j 6= i∗ , which can      dropout can be also characterized as affine transformations at
be encoded with the following sets:                                  evaluation time.
                   n                            o                       In order to address the verification problem in Section 2.3,
              X = x ∈ Dx : kx − x0 kp ≤  ,                  (13)    we encode each of the components of the network as a set of
              Y = {y ∈ Dy : yi∗ > yj ∀j 6= i∗ } ,            (14)    linear constraints and then apply existing MIP technology to
                                                                     solve it. We encode the input and the linear equations that de-
where  is the radius of the allowable perturbation in the in-       scribe the forward-pass of the network. We then encode Y C ,
put. If p = 1 or p = ∞, we have linear constraints. En-              the complement of the output set Y, and search for a feasible
coding the output set Y is not possible with a single linear         assignment. Any feasible assignment corresponds to a coun-
program given that the maximum operator requires a disjunc-          terexample of the desired property. If no feasible assignment
tion of half-spaces. With our MIP formulation, the set can be        is found, then we can conclude that the network maps all the
encoded directly.                                                    points in X to Y, which certifies the property.
                                                                        Below we present the formulation of each block as a set of
Full Precision Neural Network Verification                           linear constraints.
There are plenty of approaches to verifying properties of neu-
ral networks [Liu et al., 2021]. Some methods approach ver-          Input and Output Sets. The verification problem consists
ification as an optimization problem, which is an idea that          of determining whether all the elements in an input set X
we will use here. Many methods that only work for piece-             map to an output set Y. In order to formulate that problem
wise linear functions implement a search strategy over the           as a MIP, then X must be expressible as a conjunction of lin-
activation state of the nodes in the network. Some apply Sat-        ear constraints. Additionally, given that the MIP solver will
isfiability Modulo Theory (SMT) by iteratively searching for         search for a feasible assignment, we need to formulate Y C as
assignments that satisfy all given constraints while treating        another polyhedron given that
the non-linear constraints lazily. One such method is the Re-                        x ∈ X =⇒ y = F (x) ∈ Y                    (15)
luplex algorithm [Katz et al., 2017; Katz et al., 2019], which                                 ⇐⇒                              (16)
can be used to verify properties of binarized neural networks.
However, without significant modification, it would not ex-                                             / Y c.
                                                                                     x ∈ X =⇒ y = F (x) ∈                      (17)
ploit the specific characteristics of BNNs.                          Therefore, the input set and the complement of the output set
                                                                     have to be encoded as linear constraints on their correspond-
Binarized Neural Network Verification
                                                                     ing variables.
There are a few approaches designed specifically for veri-
fying properties of BNNs. Some approaches rely on SAT                Linear. ẑi = Qi zi−1 where Qi ∈ {−1, 1}ki+1 ×ki
solvers by reducing the verification problem to a Boolean sat-                 ẑi,j = qjT zi−1         j = 1, . . . , ki+1    (18)
isfiability problem [Narodytska et al., 2018; Narodytska et
al., 2020] which limits their applicability exclusively to fully     ReLU. zi = ReLU(ẑi ) = max(0, ẑi ) and given that l ≤
binarized networks (networks with exclusively binary param-          ẑi ≤ u and β ∈ {0, 1}, we can encode the block as:
eters and activations). Recently, a SAT solver based approach                              zi ≤ ẑi − l(1 − β)                 (19)
that is able to handle BNN constraints to speedup the verifi-
cation process was introduced [Jia and Rinard, 2020a].                                     ẑi ≤ zi                            (20)
   Another approach that can be applied to networks with                                   zi ≤ β · u                          (21)
both binary and full precision parameters and piece-wise lin-                               0 ≤ zi                             (22)
ear activations functions was recently introduced [Amir et al.,
                                                                                            β ∈ {0, 1}                         (23)
2021] and is an SMT-based technique that extends the Relu-
plex algorithm [Katz et al., 2019] and includes various opti-        Sign. zi = sign(ẑi )
mizations specifically designed to leverage BNNs.                                         ẑi ≥ 0 =⇒ zi = 1                    (24)
   Our approach relies on a simple mixed integer linear pro-
gramming formulation, allowing us to handle both fully and                                ẑi < 0 =⇒ zi = 0                    (25)
but given bounds l ≤ zi ≤ u, this can be formulated as                DNNs. Even though the networks are not identical, we opted
                           −1 ≤ zi                             (26)   for this comparison to motivate the use of BNNs for tasks
                                                                      where DNNs are normally favored given that we expect BNN
                             zi ≤ 1                            (27)   verification to be significantly faster.
                          l · β ≤ ẑi                          (28)
                             ẑi ≤ u(1 − β)                    (29)   4.1    MNIST
                             zi = 1 − 2 · β                    (30)   We trained DNNs and BNNs on the MNIST dataset [LeCun
                                                                      and Cortes, 2010]. We did not tune hyper-parameters, and we
                              β ∈ {0, 1}                       (31)
                                                                      used a very simple training setup. The test set accuracy of the
Additionally, given the structure of the linear block that pre-       DNN was 98.2% and the BNN 95.6%. We thresholded all
cedes the sign block and the fact that the variables only take        the grayscale values to black and white, which is equivalent
values in {−1, 1}, we can always easily compute l ≥ −ki               to adding a binarization block at the input of the network.
and u ≤ ki .                                                             We verified robustness properties using the input and out-
Batch Normalization. The batch normalization blocks                   put sets defined in equations 13 and 14 by allowing a maxi-
have the output of linear blocks as their input.                      mum perturbation of  around known input points using the
                                
                                  yi − µki
                                                                     p = ∞ norm. We set a time limit of 120 seconds for each
                  zi = αki                    + γk             (32)   property and report our results in Table 1.
                                     σki
where y = (y1 , . . . , ynk+1 ) and αk , γk , µk σki ∈ R, assuming
                                                                                               Time (s)               Accuracy (%)
σki > 0. which can be rewritten as
  σki zi = αki yi − αki µki + σki γki i = 1, . . . , nk+1 (33)                        Mean       Max       timeout    Verified    Data
which is a linear constraint.                                         BNN       0.1    0.223     3.21         0.00      88.24      95.6
   Max. y = max(x1 , x2 , . . . , xm ) and li ≤ xi ≤ ui can be        DNN               5.47    28.12       0.05%       94.33     98.22
formulated as a set of linear constraints in the following way:       BNN       0.3    0.194     4.54         0.00      61.78      95.6
    y ≤ xi + (1 − βi )(umax,−i − li ) i = 1, . . . , m (34)           DNN               7.12    41.33       1.02%       80.68     98.22
                                    y ≥ xi i = 1, . . . , m (35)
                                 m                                    Table 1: MNIST results. The  column indicates the maximum al-
                               X                                      lowed perturbation that defines X , The Mean column corresponds to
                                    βi = 1                     (36)   the average time needed for the properties that did not timeout, the
                            i=1                                       Max column shows the maximum time taken to verify a property
                            βi ∈ {0, 1}                    (37)       that did not time out, the timeout column shows the proportion of
   In practice, BNNs are implemented by repeated composi-             properties that exceeded the time limit. The verified accuracy cor-
                                                                      responds to the proportion of samples that for which the input set
tions of the blocks described above. In the case of BNNs,
                                                                      X defined a property that was verified, whereas the data accuracy
the usual order of the linear layer and batch normalization are       columns shows the accuracy of the network evaluated on the test
exchanged as suggested by [Hubara et al., 2016] but the MIP           set.
formulation is equivalent.
   Using the above formulations, we proceded to encode the
verification problem by parsing a BNN to construct the con-           4.2    ACAS
straints that represent its forward pass and the corresponding
input and output set constraints. We then call a mixed integer        We used a networks and property introduced by Katz et al.
program solver and search for a feasible solution. If a feasi-        [2017]. We trained a BNN version of the ACAS controller
ble solution is found, then it certifies that the property does       and tested a subset of Property 10 that only requires running
not hold and it serves as a counter example. If the search for        one query. We sliced the state space by setting the time until
a feasible solution terminates, given the completeness of the         loss of vertical separation τ = 5 and the previous advisory as
procedure, we certify that the property holds.                        clear-of-conflict. We fixed the speed of ownship and speed
   We implemented a tool that parses networks and produces            of intruder values to simplify the property. For the BNN,
their corresponding encoding for the solver. In the follow-           we used 8-bits to represent each input and added a layer at
ing section, we present the experiments we used to evaluate           the input to decrease the dimensionality to that of the orig-
our approach and compare it to traditional DNN verification           inal ACAS network. Quantizing the input can significantly
tools.                                                                alter the behavior of the controller, our goal was simply to as-
                                                                      sess the speedup of verification. We also trained versions of
4   Experiments                                                       the network with full precision inputs but neither satisfied the
                                                                      property. We report our results in Table 2.
To demonstrate our approach, we performed a series of ex-
periments. We then compared the performance of our ap-
proach to that of other publicly available verification algo-         5     Conclusion
rithms. However, given the fact that our algorithm is specif-         Our results indicate that our simple MIP approach for veri-
ically designed for BNNs, we decided to compare its per-              fying properties of BNNs performs significantly faster than
formance to the performance of verification algorithms for            other methods for DNNs. Training BNNs is challenging but
                             Loss        time (s)     result        [Jia and Rinard, 2020a] Kai Jia and Martin Rinard. Efficient
                                                                       exact verification of binarized neural networks. In Ad-
                    BNN      2174.43           2.37   violated         vances in Neural Information Processing Systems, 2020.
   full precision   DNN      1203.44          41.44      holds
                                                                    [Jia and Rinard, 2020b] Kai Jia and Martin Rinard. Exploit-
   8 bit            BNN      1634.25           5.73     holds          ing verified neural networks via floating point numerical
                                                                       error. CoRR, abs/2003.03021, 2020.
                     Table 2: ACAS results.
                                                                    [Katz et al., 2017] Guy Katz, Clark Barrett, David L. Dill,
                                                                       Kyle Julian, and Mykel J Kochenderfer. Reluplex: An ef-
the reduction in verification cost should be considered and in-        ficient SMT solver for verifying deep neural networks. In
centivize their use for safety-critical applications that require      International Conference on Computer Aided Verification,
verification of certain properties.                                    2017.
   In our MNIST experiments we were able to verify the ro-          [Katz et al., 2019] Guy Katz, Derek A Huang, Duligur Ibel-
bustness of networks with about a 10× reduction in verifica-           ing, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth
tion time.                                                             Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić,
   Our ACAS experiments show that our approach is able                 et al. The Marabou framework for verification and analy-
to verify BNNs that implement controllers in about 20× re-             sis of deep neural networks. In International Conference
duced time. BNNs appear to be particularly well suited as              on Computer Aided Verification, 2019.
controllers for safety critical systems.                            [LeCun and Cortes, 2010] Yann LeCun and Corinna Cortes.
   Our proposed approach encodes BNN as mixed integer lin-             MNIST handwritten digit database. 2010.
ear programs and is able to verify properties of binarized neu-     [Liu et al., 2021] Changliu Liu, Tomer Arnon, Christopher
ral networks and partially binarized neural networks. Our              Lazarus, Christopher Strong, Clark Barrett, and Mykel J.
experiments indicate that this approach is about an order of           Kochenderfer. Algorithms for verifying deep neural net-
magnitude faster than verifying properties of comparable full          works. Foundations and Trends R in Optimization, 4(3-
precision neural networks.                                             4):244–404, 2021.
   The ease with which we can verify BNNs should increase           [Lomuscio and Maganti, 2017] Alessio Lomuscio and Lalit
their use for safety critical applications. BNNs are harder
                                                                       Maganti. An approach to reachability analysis for feed-
to train, but the difficulty might be worth the cost given
                                                                       forward relu neural networks. CoRR, abs/1706.07351,
how much faster verification becomes along with the effi-
                                                                       2017.
cient hardware implementations that they enable. The use of
BNNs has some drawbacks and requires considerations such            [Narodytska et al., 2018] Nina Narodytska, Shiva Ka-
as how to handle non-binary inputs. Quantizing the inputs al-          siviswanathan, Leonid Ryzhyk, Mooly Sagiv, and Toby
lows us to preserve the binary architecture but decreases the          Walsh. Verifying properties of binarized deep neural
applicability of BNNs because some applications might have             networks. In AAAI Conference on Artificial Intelligence
continuous input domains that would be better modeled with             (AAAI), 2018.
floating point numbers. However, it appears that even veri-         [Narodytska et al., 2020] Nina Narodytska, Hongce Zhang,
fied networks with floating point parameters are potentially           Aarti Gupta, and Toby Walsh. In search for a SAT-friendly
unsafe [Jia and Rinard, 2020b].                                        binarized neural network architecture. In International
   We used a general purpose mixed integer programming                 Conference on Learning Representations, 2020.
solver. A potential area of future research would be to design      [Papernot et al., 2016] Nicolas Papernot, Patrick McDaniel,
a MIP solver that exploits some of the BNN specific charac-            Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Anan-
teristics of the problems to further decrease verification time.       thram Swami. The limitations of deep learning in adver-
Another potential direction would be to train ternary BNNs             sarial settings. In IEEE European Symposium on Security
in order to explore the impact of sparsification on runtime,           and Privacy (EuroS&P), 2016.
given that he equations that encode ternary BNNs require
fewer variables                                                     [Simons and Lee, 2019] Taylor Simons and Dah-Jye Lee. A
                                                                       review of binarized neural networks. Electronics, 8(6):4–
                                                                       5, 2019.
References                                                          [Tjeng et al., 2019] Vincent Tjeng, Kai Y. Xiao, and Russ
[Amir et al., 2021] Guy Amir, Haoze Wu, Clark Barrett, and             Tedrake. Evaluating robustness of neural networks with
                                                                       mixed integer programming. In International Conference
  Guy Katz. An SMT-based approach for verifying bina-
                                                                       on Learning Representations, 2019.
  rized neural networks. Tools and Algorithms for the Con-
  struction and Analysis of Systems, TACAS, 2021.
[Hubara et al., 2016] Itay Hubara, Matthieu Courbariaux,
  Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Bina-
  rized neural networks. In Advances in Neural Information
  Processing Systems, 2016.