A Mixed Integer Programming Approach for Verifying Properties of Binarized Neural Networks Christopher Lazarus∗ and Mykel J. Kochenderfer Stanford University {clazarus, mykel}@stanford.edu Abstract form of a statement that if the input belongs to a set X , then the output is in a set Y. In the context of control, this formu- Many approaches for verifying input-output prop- lation can be used to verify that the network satisfies safety erties of neural networks have been proposed re- constraints; in the classification setting, this formulation can cently. However, existing algorithms do not scale be used to certify that points near a training sample are as- well to large networks. Recent work in the field signed the same label as that sample. of model compression studied binarized neural net- Verification algorithms are typically designed to be sound, works (BNNs), whose parameters and activations which means that they will only conclude a property holds if are binary. BNNs tend to exhibit a slight decrease the property truly holds [Katz et al., 2017; Katz et al., 2019]. in performance compared to their full-precision With the aim to improve efficiency, some approaches sac- counterparts, but they can be easier to verify. This rifice completeness, meaning that even if a property holds, paper proposes a simple mixed integer program- the algorithm might fail to identify it. Incomplete algorithms ming formulation for BNN verification that lever- often rely on over-approximation, allowing them to scale to ages network structure. We demonstrate our ap- problems involving larger networks, high dimensional input proach by verifying properties of BNNs trained on spaces, or high dimensional output spaces. the MNIST dataset and an aircraft collision avoid- Even when restricting the class of networks to those with ance controller. We compare the runtime of our Rectified Linear Unit (ReLU) activation functions (or even approach against state-of-the-art verification algo- a more general piecewise-linear function) the problem has rithms for full-precision neural networks. The re- been shown to be NP-hard [Katz et al., 2017]. The hard- sults suggest that the difficulty of training BNNs ness of the verification problem has motivated many differ- might be worth the reduction in runtime achieved ent approaches [Liu et al., 2021], including reachability, opti- by our verification algorithm. mization and search algorithms. Even incomplete algorithms struggle to verify properties of networks with sizes commonly 1 Introduction encountered in contemporary applications. In principle, sim- pler models should be easier to verify and binarized neural networks (BNNs) [Hubara et al., 2016] are simpler than tra- Neural networks have been shown to be susceptible to ad- ditional full precision neural networks. Their parameters are versarial attacks [Papernot et al., 2016] often leading to dras- binary and their activations are binary. tically different outputs when slightly perturbing their input Binarization is an extreme quantization scheme that can which can be costly or dangerous. Multiple approaches to significantly reduce memory and computation requirements. evaluate the robustness of networks to adversarial attacks However, binarization introduces non-differentiable and even have been developed. Many of these only provide statistical non-continuous blocks to the computational graph of a net- assessments and focus on evaluating the network on a large work, which complicates the optimization used to train the but finite collection of points in the input space. However, network. However, recent work motivated by their applicabil- given that the input space is, in principle, infinite in cardinal- ity in highly constrained environments such as edge devices ity, it is not viable to assess the output for all the points in the has enabled them to achieve performance comparable to tra- input space. ditional full precision networks [Hubara et al., 2016]. Recently, new approaches have emerged as an alternative The reduced memory requirement and simplified computa- to formally certify the input-output properties of neural net- tion resulting from this representation has a drawback: bina- works [Liu et al., 2021]. Properties are often specified in the rized neural networks are harder to train. A major challenge ∗ Contact Author is back propagating the gradient of the weights through sign Copyright c 2021 for this paper by its authors. Use permitted functions. There are workarounds [Simons and Lee, 2019] under Creative Commons License Attribution 4.0 International (CC such as using straight-through estimators (STE) [Hubara et BY 4.0). al., 2016]. This paper presents an approach to verify properties of where BW and Bz are binarized weights and binarized acti- BNNs that leverages their structure. The verification problem vations, with scaling factors α and β used for batch normal- is formulated as a mixed integer programming (MIP) problem ization. The sign function is often used to compute QW and that encodes the input set X by constraining variables asso- Qz  ciated with the input layer of the network and the output set +1, if x ≥ 0 Y by constraining variables associated to the output layer of sign(x) = (6) −1, otherwise the network. In our approach, we leverage the binary nature The above representation enables an easy implementation of both the parameters and the activations of the network. of batch normalization while keeping most parameters and Experimentally, we show that our approach can verify operations binary. properties of BNNs. Section 4 demonstrates the capabili- In this context, the arithmetic operations needed for a for- ties of our approach by formally assessing the adversarial ro- ward pass of a layer z b in a binarized network F b can be bustness of BNNs trained on the MNIST [LeCun and Cortes, computed as: 2010] dataset and properties of a collision avoidance system [Katz et al., 2017]. We compare the runtime of our approach zib = σ Q(W )zib = fi (zi−1b b   ) = σi Q(W )i zi−1 (7) to that of the state-of-the-art implementation of a full pre- cision network verification algorithm for the equivalent full = σ (αβBW } Bz ) = αβ sign (BW } Bz ) (8) precision networks. Our proposed approach reduces signifi- where } denotes the inner product for binary vectors with cantly verification runtime. bitwise operation XNOR-Bitcount. Linear. 2 Background ẑi = Qi zi−1 (9) 2.1 Neural Networks Consider F a feedforward neural network with n layers with where Qi ∈ {−1, 1}ki+1 ×k . input x ∈ Dx ⊆ Rk0 and output y ∈ Dy ⊆ Rkn . Here, Batch Normalization. y = F (x) corresponds to evaluating the network on input   x and obtaining output y, k0 is the dimensionality of x, and yi − µki ẑi = αki + γk (10) kn is the dimensionality of y. We assume that all inputs and σki outputs are flattened to vectors. Each layer in F is a function where y = (y1 , . . . , ynk+1 ) and αk , γk , σki ∈ R. fi : Rki−1 → Rki , where ki is the dimensionality of the hidden variable zi in layer i. Accordingly, z0 = x and zn = Activation Function. y. The network can be represented by zi = sign(ẑi ) (11) F = fn ◦ fn ◦ fn−1 ◦ · · · ◦ f1 (1) where ẑi ∈ Rki+1 and zi ∈ {−1, 1}ki+1 . and Some BNN architectures call for binarized inputs as zi = fi (zi−1 ) = σi (Wi zi−1 + bi ) (2) well. Our verification approach does not require binarized where Wi ∈ Rki ×ki−1 is the weight matrix, bi ∈ Rki is the inputs, but this requirement is easy to incorporate by rep- bias vector, and σi : Rki → Rki is the activation function. resenting floating or fixed point inputs as a combination of Let zi,j be the value of the jth node in the ith layer, wi,j ∈ binary inputs, either by quantizing or directly using their R1×ki−1 be the jth row of Wi , and wi,j,k be the kth entry in binary representation. wi,j . Given that activation functions are usually component- Section 3 derives how each type of block can be encoded wise vector functions we have that with linear constraints, enabling the formulation of a mixed- ! integer programming problem for verification purposes. The X last layer of a network can be used in different ways, often zi,j = σi,j (wi,j zi−1 ) = σ wi,j,k zi−1,k (3) using a softmax or an argmax operator. In either case, we can k encode desired properties at the output of the layer before = σi,j (ẑi,j ) (4) such functions, at the so called logits. where ẑi := Wi zi−1 . We dropped the bias terms in this anal- 2.3 Neural Network Verification ysis for compactness and without loss of generality. The verification problem consists of checking whether input- 2.2 Binarized Neural Networks output relationships of a function hold [Liu et al., 2021]. A A binarized neural network is a network involving binary subset of the input space X ⊆ Dx and a subset of the output weights and activations [Hubara et al., 2016]. The goal of space Y ⊆ Dy are defined. In its most general form, solv- network binarizaion is to represent the floating-point weights ing the verification problem requires certifying whether the W and the activations zi,j for a given layer using 1 bit. The following holds: parameters are represented by: x ∈ X =⇒ y = F (x) ∈ Y. (12) In general, the input and output sets could have any geometry Q(W ) = αBW Q(z) = βBz (5) but are often constrained by the verification algorithm. Our approach restricts the class of X and Y to closed polytopes, partially binarized neural networks. A similar approach has corresponding to the intersection of half-spaces. These re- been demonstrated to work well for DNNs [Tjeng et al., gions can be encoded as a conjunction of linear constraints, 2019] and MIP has been also used for reachability analysis which is required for our mixed integer programming formu- [Lomuscio and Maganti, 2017]. Our main contribution is to lation. apply MIP to the verification of BNNs. Applications Given that neural networks provide the state- of-the-art performance for many tasks, such as perception and 3 Verification of BNNs using Mixed Integer control [Katz et al., 2017] tasks, studying their robustness has Programing attracted significant attention [Papernot et al., 2016]. In the Binarized neural networks are composed of piecewise-linear context of image classification, a network F assigns a value layers that may be fully connected, convolutional, or average- to each of the possible labels in its training set and the max- pooling that may have piecewise-linear activation functions, imum value arg maxi yi is often used to impute the label of such as ReLU, max and sign. Other commonly used ele- an input x. Consider an input x0 with label i∗ ∈ {1, . . . , ln }. ments such as batch normalization, shortcut connections, and It would be desirable that yi∗ > yk for all j 6= i∗ , which can dropout can be also characterized as affine transformations at be encoded with the following sets: evaluation time. n o In order to address the verification problem in Section 2.3, X = x ∈ Dx : kx − x0 kp ≤  , (13) we encode each of the components of the network as a set of Y = {y ∈ Dy : yi∗ > yj ∀j 6= i∗ } , (14) linear constraints and then apply existing MIP technology to solve it. We encode the input and the linear equations that de- where  is the radius of the allowable perturbation in the in- scribe the forward-pass of the network. We then encode Y C , put. If p = 1 or p = ∞, we have linear constraints. En- the complement of the output set Y, and search for a feasible coding the output set Y is not possible with a single linear assignment. Any feasible assignment corresponds to a coun- program given that the maximum operator requires a disjunc- terexample of the desired property. If no feasible assignment tion of half-spaces. With our MIP formulation, the set can be is found, then we can conclude that the network maps all the encoded directly. points in X to Y, which certifies the property. Below we present the formulation of each block as a set of Full Precision Neural Network Verification linear constraints. There are plenty of approaches to verifying properties of neu- ral networks [Liu et al., 2021]. Some methods approach ver- Input and Output Sets. The verification problem consists ification as an optimization problem, which is an idea that of determining whether all the elements in an input set X we will use here. Many methods that only work for piece- map to an output set Y. In order to formulate that problem wise linear functions implement a search strategy over the as a MIP, then X must be expressible as a conjunction of lin- activation state of the nodes in the network. Some apply Sat- ear constraints. Additionally, given that the MIP solver will isfiability Modulo Theory (SMT) by iteratively searching for search for a feasible assignment, we need to formulate Y C as assignments that satisfy all given constraints while treating another polyhedron given that the non-linear constraints lazily. One such method is the Re- x ∈ X =⇒ y = F (x) ∈ Y (15) luplex algorithm [Katz et al., 2017; Katz et al., 2019], which ⇐⇒ (16) can be used to verify properties of binarized neural networks. However, without significant modification, it would not ex- / Y c. x ∈ X =⇒ y = F (x) ∈ (17) ploit the specific characteristics of BNNs. Therefore, the input set and the complement of the output set have to be encoded as linear constraints on their correspond- Binarized Neural Network Verification ing variables. There are a few approaches designed specifically for veri- fying properties of BNNs. Some approaches rely on SAT Linear. ẑi = Qi zi−1 where Qi ∈ {−1, 1}ki+1 ×ki solvers by reducing the verification problem to a Boolean sat- ẑi,j = qjT zi−1 j = 1, . . . , ki+1 (18) isfiability problem [Narodytska et al., 2018; Narodytska et al., 2020] which limits their applicability exclusively to fully ReLU. zi = ReLU(ẑi ) = max(0, ẑi ) and given that l ≤ binarized networks (networks with exclusively binary param- ẑi ≤ u and β ∈ {0, 1}, we can encode the block as: eters and activations). Recently, a SAT solver based approach zi ≤ ẑi − l(1 − β) (19) that is able to handle BNN constraints to speedup the verifi- cation process was introduced [Jia and Rinard, 2020a]. ẑi ≤ zi (20) Another approach that can be applied to networks with zi ≤ β · u (21) both binary and full precision parameters and piece-wise lin- 0 ≤ zi (22) ear activations functions was recently introduced [Amir et al., β ∈ {0, 1} (23) 2021] and is an SMT-based technique that extends the Relu- plex algorithm [Katz et al., 2019] and includes various opti- Sign. zi = sign(ẑi ) mizations specifically designed to leverage BNNs. ẑi ≥ 0 =⇒ zi = 1 (24) Our approach relies on a simple mixed integer linear pro- gramming formulation, allowing us to handle both fully and ẑi < 0 =⇒ zi = 0 (25) but given bounds l ≤ zi ≤ u, this can be formulated as DNNs. Even though the networks are not identical, we opted −1 ≤ zi (26) for this comparison to motivate the use of BNNs for tasks where DNNs are normally favored given that we expect BNN zi ≤ 1 (27) verification to be significantly faster. l · β ≤ ẑi (28) ẑi ≤ u(1 − β) (29) 4.1 MNIST zi = 1 − 2 · β (30) We trained DNNs and BNNs on the MNIST dataset [LeCun and Cortes, 2010]. We did not tune hyper-parameters, and we β ∈ {0, 1} (31) used a very simple training setup. The test set accuracy of the Additionally, given the structure of the linear block that pre- DNN was 98.2% and the BNN 95.6%. We thresholded all cedes the sign block and the fact that the variables only take the grayscale values to black and white, which is equivalent values in {−1, 1}, we can always easily compute l ≥ −ki to adding a binarization block at the input of the network. and u ≤ ki . We verified robustness properties using the input and out- Batch Normalization. The batch normalization blocks put sets defined in equations 13 and 14 by allowing a maxi- have the output of linear blocks as their input. mum perturbation of  around known input points using the  yi − µki  p = ∞ norm. We set a time limit of 120 seconds for each zi = αki + γk (32) property and report our results in Table 1. σki where y = (y1 , . . . , ynk+1 ) and αk , γk , µk σki ∈ R, assuming  Time (s) Accuracy (%) σki > 0. which can be rewritten as σki zi = αki yi − αki µki + σki γki i = 1, . . . , nk+1 (33) Mean Max timeout Verified Data which is a linear constraint. BNN 0.1 0.223 3.21 0.00 88.24 95.6 Max. y = max(x1 , x2 , . . . , xm ) and li ≤ xi ≤ ui can be DNN 5.47 28.12 0.05% 94.33 98.22 formulated as a set of linear constraints in the following way: BNN 0.3 0.194 4.54 0.00 61.78 95.6 y ≤ xi + (1 − βi )(umax,−i − li ) i = 1, . . . , m (34) DNN 7.12 41.33 1.02% 80.68 98.22 y ≥ xi i = 1, . . . , m (35) m Table 1: MNIST results. The  column indicates the maximum al- X lowed perturbation that defines X , The Mean column corresponds to βi = 1 (36) the average time needed for the properties that did not timeout, the i=1 Max column shows the maximum time taken to verify a property βi ∈ {0, 1} (37) that did not time out, the timeout column shows the proportion of In practice, BNNs are implemented by repeated composi- properties that exceeded the time limit. The verified accuracy cor- responds to the proportion of samples that for which the input set tions of the blocks described above. In the case of BNNs, X defined a property that was verified, whereas the data accuracy the usual order of the linear layer and batch normalization are columns shows the accuracy of the network evaluated on the test exchanged as suggested by [Hubara et al., 2016] but the MIP set. formulation is equivalent. Using the above formulations, we proceded to encode the verification problem by parsing a BNN to construct the con- 4.2 ACAS straints that represent its forward pass and the corresponding input and output set constraints. We then call a mixed integer We used a networks and property introduced by Katz et al. program solver and search for a feasible solution. If a feasi- [2017]. We trained a BNN version of the ACAS controller ble solution is found, then it certifies that the property does and tested a subset of Property 10 that only requires running not hold and it serves as a counter example. If the search for one query. We sliced the state space by setting the time until a feasible solution terminates, given the completeness of the loss of vertical separation τ = 5 and the previous advisory as procedure, we certify that the property holds. clear-of-conflict. We fixed the speed of ownship and speed We implemented a tool that parses networks and produces of intruder values to simplify the property. For the BNN, their corresponding encoding for the solver. In the follow- we used 8-bits to represent each input and added a layer at ing section, we present the experiments we used to evaluate the input to decrease the dimensionality to that of the orig- our approach and compare it to traditional DNN verification inal ACAS network. Quantizing the input can significantly tools. alter the behavior of the controller, our goal was simply to as- sess the speedup of verification. We also trained versions of 4 Experiments the network with full precision inputs but neither satisfied the property. We report our results in Table 2. To demonstrate our approach, we performed a series of ex- periments. We then compared the performance of our ap- proach to that of other publicly available verification algo- 5 Conclusion rithms. However, given the fact that our algorithm is specif- Our results indicate that our simple MIP approach for veri- ically designed for BNNs, we decided to compare its per- fying properties of BNNs performs significantly faster than formance to the performance of verification algorithms for other methods for DNNs. Training BNNs is challenging but Loss time (s) result [Jia and Rinard, 2020a] Kai Jia and Martin Rinard. Efficient exact verification of binarized neural networks. In Ad- BNN 2174.43 2.37 violated vances in Neural Information Processing Systems, 2020. full precision DNN 1203.44 41.44 holds [Jia and Rinard, 2020b] Kai Jia and Martin Rinard. Exploit- 8 bit BNN 1634.25 5.73 holds ing verified neural networks via floating point numerical error. CoRR, abs/2003.03021, 2020. Table 2: ACAS results. [Katz et al., 2017] Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: An ef- the reduction in verification cost should be considered and in- ficient SMT solver for verifying deep neural networks. In centivize their use for safety-critical applications that require International Conference on Computer Aided Verification, verification of certain properties. 2017. In our MNIST experiments we were able to verify the ro- [Katz et al., 2019] Guy Katz, Derek A Huang, Duligur Ibel- bustness of networks with about a 10× reduction in verifica- ing, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth tion time. Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, Our ACAS experiments show that our approach is able et al. The Marabou framework for verification and analy- to verify BNNs that implement controllers in about 20× re- sis of deep neural networks. In International Conference duced time. BNNs appear to be particularly well suited as on Computer Aided Verification, 2019. controllers for safety critical systems. [LeCun and Cortes, 2010] Yann LeCun and Corinna Cortes. Our proposed approach encodes BNN as mixed integer lin- MNIST handwritten digit database. 2010. ear programs and is able to verify properties of binarized neu- [Liu et al., 2021] Changliu Liu, Tomer Arnon, Christopher ral networks and partially binarized neural networks. Our Lazarus, Christopher Strong, Clark Barrett, and Mykel J. experiments indicate that this approach is about an order of Kochenderfer. Algorithms for verifying deep neural net- magnitude faster than verifying properties of comparable full works. Foundations and Trends R in Optimization, 4(3- precision neural networks. 4):244–404, 2021. The ease with which we can verify BNNs should increase [Lomuscio and Maganti, 2017] Alessio Lomuscio and Lalit their use for safety critical applications. BNNs are harder Maganti. An approach to reachability analysis for feed- to train, but the difficulty might be worth the cost given forward relu neural networks. CoRR, abs/1706.07351, how much faster verification becomes along with the effi- 2017. cient hardware implementations that they enable. The use of BNNs has some drawbacks and requires considerations such [Narodytska et al., 2018] Nina Narodytska, Shiva Ka- as how to handle non-binary inputs. Quantizing the inputs al- siviswanathan, Leonid Ryzhyk, Mooly Sagiv, and Toby lows us to preserve the binary architecture but decreases the Walsh. Verifying properties of binarized deep neural applicability of BNNs because some applications might have networks. In AAAI Conference on Artificial Intelligence continuous input domains that would be better modeled with (AAAI), 2018. floating point numbers. However, it appears that even veri- [Narodytska et al., 2020] Nina Narodytska, Hongce Zhang, fied networks with floating point parameters are potentially Aarti Gupta, and Toby Walsh. In search for a SAT-friendly unsafe [Jia and Rinard, 2020b]. binarized neural network architecture. In International We used a general purpose mixed integer programming Conference on Learning Representations, 2020. solver. A potential area of future research would be to design [Papernot et al., 2016] Nicolas Papernot, Patrick McDaniel, a MIP solver that exploits some of the BNN specific charac- Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Anan- teristics of the problems to further decrease verification time. thram Swami. The limitations of deep learning in adver- Another potential direction would be to train ternary BNNs sarial settings. In IEEE European Symposium on Security in order to explore the impact of sparsification on runtime, and Privacy (EuroS&P), 2016. given that he equations that encode ternary BNNs require fewer variables [Simons and Lee, 2019] Taylor Simons and Dah-Jye Lee. A review of binarized neural networks. Electronics, 8(6):4– 5, 2019. References [Tjeng et al., 2019] Vincent Tjeng, Kai Y. Xiao, and Russ [Amir et al., 2021] Guy Amir, Haoze Wu, Clark Barrett, and Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference Guy Katz. An SMT-based approach for verifying bina- on Learning Representations, 2019. rized neural networks. Tools and Algorithms for the Con- struction and Analysis of Systems, TACAS, 2021. [Hubara et al., 2016] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Bina- rized neural networks. In Advances in Neural Information Processing Systems, 2016.