=Paper=
{{Paper
|id=Vol-3409/paper1
|storemode=property
|title=Compiling Neural Network Classifiers into Boolean Circuits for Efficient Shap-Score Computation (short paper)
|pdfUrl=https://ceur-ws.org/Vol-3409/paper1.pdf
|volume=Vol-3409
|authors=Leopoldo Bertossi,Jorge E. Leon
|dblpUrl=https://dblp.org/rec/conf/amw/BertossiL23
}}
==Compiling Neural Network Classifiers into Boolean Circuits for Efficient Shap-Score Computation (short paper)==
Compiling Neural Network Classifiers into Boolean
Circuits for Efficient Shap-Score Computation
Leopoldo Bertossi1 , Jorge E. León2
1
SKEMA Business School, Montreal, Canada
2
Universidad Adolfo Ibáñez (UAI), Santiago, Chile
Abstract
We describe the transformation of binary neural networks (BNNs) for classification into deterministic
and decomposable Boolean circuits by means of knowledge compilation techniques. The resulting circuit
is used, as an open-box model, to compute Shap scores taking advantage of a recent efficient algorithm
for Shap on this class of circuits. We show experimental results that corroborate the huge improvement
in score computation time in comparison with the computation that directly uses the BNN as a black-box
model.
Keywords
Explainable ML, Shap Scores, Knowledge Compilation, Neural Networks, Boolean Decision Circuits
1. Introduction
Explanations for the outcomes from classification models come in different forms, and can be
obtained through different approaches. A common one assigns attribution scores to the features
values associated to an input that goes through an ML-based model, to quantify their relevance
for the obtained outcome. We concentrate on local scores, i.e. associated to a particular input,
as opposed to global scores that indicates the overall relevance of a feature. In this work, we
also concentrate on explanations for binary classification models, whose features take binary
values, so as the classification label, say 0 or 1.
A popular local score is Shap [10], which is based on the Shapley value that has introduced
and used in coalition game theory and practice for a long time [15, 13]. Shap scores can be
computed with a black-box or an open-box model [14]. With the former, we do not know or use
its internal components, but only its input/output relation. This is the most common approach.
In the latter case, we can have access to its internal structure and components, and we can
use them for score computation. It is common to consider neural-network-based models as
black-box models, because their internal gates and structure may be difficult to understand or
process when it comes to explaining classification outputs. However, a decision-tree model, due
to its much simpler structure and use, is considered to be open-box for the same purpose.
Even for binary classification models, the complexity of Shap computation is provably hard,
actually #𝑃 -hard for several kinds of binary classification models, independently from whether
AMW 2023: Alberto Mendelzon International Workshop on Foundations of Data Management, May 22–26, 2023, Santiago,
Chile
$ leopoldo.bertossi@skema.edu (L. Bertossi); jorgleon@alumnos.uai.cl (J. E. León)
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
the internal components of the model are used when computing Shap [3, 1, 2]. However, there
are classes of classifiers for which, using the model components and structure, the complexity
of Shap computation can be brought down to polynomial time [11, 2, 18].
A polynomial time algorithm for Shap computation with deterministic and decomposable
Boolean circuits (dDBCs) was presented in [2]. From this result, the tractability of Shap computa-
tion can be obtained for a variety of Boolean circuit-based classifiers than can be represented as
(or compiled into) them. In particular, this holds for Ordered Binary Decision Diagrams (OBDDs)
[5], decision trees, and other established classification models that can be compiled into OBDDs
[16, 6, 12]. Similar results can be obtained for Sentential Decision Diagrams (SDDs) [9], which
can be seen as dDBCs, and form a convenient knowledge compilation target language [7, 17].
In [18], through a different approach, tractability of Shap computation was obtained for a
collection of classifiers that intersect with that in [2].
In this work, we concentrate on explicitly developing this approach to the efficient com-
putation of Shap for binary neural networks (BNNs). For this, and inspired by [16], a BNN
is transformed into a dDBC using techniques from knowledge compilation [7], an area that
investigates the transformation of (usually) propositional theories into an equivalent one with
a canonical syntactic form that has some good computational properties, e.g. tractable model
counting. The compilation may incur in a relatively high computational cost [7, 8], but it may
still be worth the effort when a particular property will be checked often, as is the case of
explanations for the same BNN.
We also make experimental comparisons between this open-box and circuit-based Shap
computation and that based directly on the BNN treated as a black-box, i.e. using only its
input/output relation. We perform comparisons in terms of computation time and alignment of
Shap scores. We confirm that Shap computation via the dDBC vastly outperforms the direct
Shap computation on the BNN. It is also the case that the scores obtained are fully aligned, as
expected since the dDBC represents the BNN. A detailed account of our work can be found in
[4].
2. Background
Consider a fixed entity e = ⟨𝐹1 (e), . . . , 𝐹𝑁 (e)⟩ subject to classification. It has values 𝐹𝑖 (e) for
features ℱ = {𝐹1 , . . . , 𝐹𝑁 }. In [10, 11], the Shapley value is applied with {𝐹1 (e), . . . , 𝐹𝑁 (e)}
as the set of players, and the game function 𝒢e (𝑠) := E(𝐿(e′ ) | e′𝑠 = e𝑠 ), giving rise to the
Shap score. Here, e𝑠 is the projection (or restriction) of e on (to) the subset 𝑠 of features, 𝐿 is
the label. The e′ inside the expected value is an entity whose values coincides with those of e
for the features in 𝑠. For 𝐹 ∈ ℱ, and entity e:
∑︁ |𝑠|!(|ℱ| − |𝑠| − 1)!
Shap(ℱ, 𝒢e , 𝐹 ) = ×
|ℱ|!
𝑠⊆ℱ ∖{𝐹 }
[ E(𝐿(e′ ) | e′𝑆∪{𝐹 } = e𝑠∪{𝐹 } ) − E(𝐿(e′ ) | e′𝑠 = e𝑠 ) ].
The expected value is defined on the basis of an underlying probability distribution on the
entity population. Shap quantifies the contribution of feature value 𝐹 (e) to the outcome label.
of the comparison in (4) greater than or equal to 0. This is
the (only) case when h1 becomes true; otherwise, it is false. straightforw
This number can be computed in general by: (Narodytska is, in its tur
et al. 2018) formula by
For this exa
|ī|
X
d= (−b + wj )/2 +# of negative weights in w̄. (5) o ←→ (−
In order to compute Shap, we only need function 𝐿, and none of the internal components of
j=1
Having a C
the classifier. Given that all possible subsets
In theof case
features
of appear in its2 definition,
h1 , with Shap is bound
negative weights: d = conversion
to be hard to compute. Actually, for some ⌈(−0.16classifiers,
+ (−1 its − 1computation
+ 1))/2⌉ + 2may become
= 2. this,-hard
With #𝑃 we can
(c.f. [2] for some cases). However, in [2],impose
it is shown that Shap
conditions on twocan be computed
input in polynomial
variables with the right sign Followin
at a time, considering all possible convenient
time for every deterministic and decomposable Boolean circuit (dDBC) used as a classifier. The pairs. For h1 formula is
we obtain its
circuit’s internal structure is used in the computation. condition to be true: (SDD) (Da
h1 ←→ (−x1 ∧ −x2 ) ∨ (−x1 ∧ x3 ) ∨ (−x2 ∧ x3 ). (6) 2015), whic
lig and But
3. BNN Compilation This is DNF formula, directly obtained from considering all popular OB
possible convenient pairs (which is already better that trying can be used
A BNN is first compiled into a propositional formula
all cases in Conjunctive
of three variables atNormal
a time).Form (CNF), there
However, which,is a ular, propo
in its turn, is compiled into an SDD, more
whichexpedite,
is finallyiterative
compiled method
into athat still uses
dDBC. the number
We compute per se prop
of convenient
Shap on the resulting circuit via the efficient inputs.
algorithm In order
in [2]. to convey theisbigger
This compilation picture,
performed Example 3
once, and is independent from any input to the classifier. The final circuit can be used(that
we postpone the detailed description of this method to is SDD, S, to
also used in our experiments) until Appendix A. Using this Nakamura,
compute Shap scores for different input entities. We show the compilation path by means of a
algorithm, we obtain an equivalent formula defining h1 : nitions.) An
simple example.
(7) sented with
In our BNNs we used, for each gate 𝑔, a step ←→ (xas
h1 function 3∧activation
(−x2 ∨ −x function,
1 )) ∨ (−xof 2the form:
∧ −x 1 ).
Broeck and
{︂ we obtain defining formulas for gates h2 and
Similarly, alternatives
¯ ¯ 1 ¯ 𝑔 ∙ ¯𝑖 + 𝑏𝑔 ≥ 0,
if 𝑤 are also no
¯ 𝑔 ∙ 𝑖 + h𝑏𝑔3), and
𝜑𝑔 (𝑖) = sp(𝑤 := o: (for all of them, d = 2)
−1 otherwise, structs of th
h2 ←→ (−x3 ∧ (−x2 ∨ −x1 )) ∨ (−x2 ∧ −x1 ), the sub, res
which is parameterized by a vector of binary hweights 𝑤¯ 𝑔 and a real-valued constant bias 𝑏𝑔 . For ing ⊤ and ⊥
3 ←→ (x3 ∧ (x2 ∨ x1 )) ∨ (x2 ∧ x1 ),
technical, non-essential reasons, we used inputs and outputs, −1, 1, except for the output gate, The sub can
o ←→ (−h3 ∧ (h2 ∨ h1 )) ∨ (h2 ∧ h1 ). (8)
𝑜, that returns 0 or 1. node. [ℓ1 |ℓ
Replacing the definitions of h1 , h2 , h3 into (8), we finally fied simulta
obtain: without • is
An SDD r
o ←→ (−[(x3 ∧ (x2 ∨ x1 )) ∨ (x2 ∧ x1 )] ∧ FS : ⟨x1 ,
([(−x3 ∧ (−x2 ∨ −x1 )) ∨ (−x2 ∧ −x1 )] ∨ FS (0, 1, 1)
[(x3 ∧ (−x2 ∨ −x1 )) ∨ (−x2 ∧ −x1 )])) ∨ Since x1 =
underneath
([(−x3 ∧ (−x2 ∨ −x1 )) ∨ (−x2 ∧ −x1 )] ∧ node labele
[(x3 ∧ (−x2 ∨ −x1 )) ∨ (−x2 ∧ −x1 )]). (9)
4
For our
The final In CNF:
part of step (a) in path (3), requires transform- does this job
ing this formula into CNF. In this example, it can be taken too much (c.
𝑜 ←→ (−𝑥1 ∨−𝑥2 )∧(−𝑥1 ∨−𝑥3 )∧(−𝑥2 ∨−𝑥3 ).
4
The BNN on the LHS above is transformed into the propositional formula on the RHS. After
that, it is transformed into a simplified CNF formula, which is shown right below. In its turn,
the CNF is transformed into the SDD on the RHS below. In general, this is the most expensive
step during the overall compilation time, but it has an upper bound that is exponential in the
tree-width of the formula [9].
next to the right. In this
formula for the output
CNF. The participating
treated as true or false,
or −1, resp.
sider gate h1 , with pa-
= 0.16, and input ī =
e conveniently instanti-
nd then, contributing to
f the comparison in (4).
n order to represent as a
able, also denoted with
f conveniently instanti-
ficient to make the LHS Figure 3: An SDD (a) and a vtree (b).
n or equal to 0. This is Finally, the SDD is easily transformed into a dDBC, the one shown on the RHS here. It can
e; otherwise, it is false. straightforwardly CNF.4 with
neral by: (Narodytska be used as a binary intoclassifier, The resulting CNF formula
binary input features 𝑥1 , 𝑥2 , 𝑥3 . The binary label is read off
is, in its turn, simplified into a shorter and simpler new CNF
fromformula
the topbynode. This circuit is deterministic in
means of the Confer SAT solver (Manthey 2017).that, for every ∨-gate, at most one of its inputs
is 1 when
For thisthe output
example, the is 1. It is CNF
simplified formula is asinfollows:
decomposable that, for every ∧-gate, the inputs do not share
ative weights in w̄. (5) features.
o ←→ This
(−xdDBC is also smooth, in that sub-circuits that feed a same ∨-gate share the same
1 ∨ −x2 ) ∧ (−x1 ∨ −x3 ) ∧ (−x2 ∨ −x3 ). (10)
features. It has a fan-in at most two, in that every ∧-gate and ∨-gate have at most two inputs.
Having a CNF formula will be convenient for the next
ive weights: d = conversion steps along path (3). 2
= 2. With this, we can
bles with the right sign 4. Experiments
Following with step (b) along path (3), the resulting CNF
onvenient pairs. For h1 formula is transformed into a Sentential Decision Diagram
For our
(SDD) experiments,
(Darwiche 2011b; we consider
Van den Broeckreal estate as an application domain, where house prices
and Darwiche
x3 ) ∨ (−x2 ∧ x3 ). (6) depend 2015),
onwhich,
certain asfeatures,
a particularwhich
kind ofwe decision diagram (Bol-
appropriately binarize.1 The problem consists in classifying
lig and Buttkus 2019), is a directed acyclic graph. So as the
ed from considering all propertypopular blocks,
OBDDsrepresented
(Bryant 1986), asthat
entity
SDDsrecords of thirteen
generalize, they feature values, as high-value or low-
lready better that trying value.canThis is atobinary
be used representclassification
general Booleanproblem
formulas,for which a BNN is first learnt, and then used.
in partic-
e). However, there is a ular, the
After propositional formulas of
transformation (butthewithout
BNNnecessarily
into circuit being
𝒞, we had the possibility to compute Shap,
t still uses the number per se propositional formulas).
nvey the bigger picture, for aExample
given input entity, in three different ways: (a) Directly on the BNN as a black-box model,
3. (example 2 cont.) Figure 3(a) shows an
of this method (that is i.e. using
SDD, only
S, to itsbe input/output relation
used for illustration. for multiple
(C.f. (Bova 2016; calls; (b) Similarly, using the circuit 𝒞 as a
Appendix A. Using this black-box Nakamura,model; and (c) With the efficient algorithm in [2] for smooth dDBCs with fan-in 2,
Denzumi, and Nishino 2020) for precise defi-
ormula defining h1 : nitions.) An SDD has different kinds of nodes. Those repre-
treating circuit 𝒞 as an open-box.
sented with encircled numbers are decision nodes (Van den
∨ (−x2 ∧ −x1 ). (7) We performed these2015),
three e.g.
computations forconsider
sets of 20, 40, 60, 80, and 100 input entities, to
Broeck and Darwiche ⃝1 and ⃝, 3 that
mulas for gates h2 and compare average times with increasing numbersThere
alternatives for the inputs (in essence, disjunctions). of entities. In all cases, Shap was computed
withare thealso nodes called elements. They are labeled with con-
uniform probability distribution over the
structs of the form [ℓ1 |ℓ2 ], where ℓ1 , ℓ2 , called the prime and
joint feature domain of size 213 . The results
∨ (−x2 ∧ −x1 ), shown right
the sub, below
resp., reportliterals,
are Boolean on thee.g. seconds
x1 and ¬x taken to compute Shap on 20, 40, 60, 80 and 100
2 , includ-
2 ∧ x1 ), ing ⊤using
entities; and ⊥,the for 1BNN
or 0, as
resp. E.g. [¬x2 |⊤](blue
a black-box is onebar),
of them.
the dDBC as a black-box (red bar), and the
(8) dDBC The sub can also be a pointer, •, with an edge to a decision
h2 ∧ h1 ). as an open-box (orange bar). Note that the
node. [ℓ1 |ℓ2 ] represents two conditions that have to be satis- vertical axis employs a logarithmic scale.
h3 into (8), we finally Infied
oursimultaneously
experiment,(inthe initiala conjunction).
essence, transformation of the BNN into CNF took 1.3 hrs, which was
An element
without
the most • is a terminal.
expensive paper of the compilation. However, it is a one time computation, and our
An SDD represents (or defines) a total Boolean function
x2 ∧ x1 )] ∧ ratherFS naive
: ⟨x1 ,transformation
x2 , x3 ⟩ ∈ {0, 1}algorithm
3
7→ {0, 1}. leaves
For considerable
example, room for improvement.
∨ (−x2 ∧ −x1 )] ∨ FS (0, 1, 1) is evaluated by following the graph downwards.
−x2 ∧ −x1 )])) ∨ Since x1 = 0, we descent to the right; next via node ⃝ 3
underneath, with x2 = 1, we reach the instantiated leaf
∨ (−x2 ∧ −x1 )] ∧ node labeled with [1|0], a “conjunction”, with the second
(−x2 ∧ −x1 )]). (9)
4
1 For our experiments, we programmed a simple algorithm that
3), requires transform- We use
doesthe
thisCalifornia
job, while Housing Prices
making sure dataset available
the generated CNF doesatnot
https://www.kaggle.com/datasets/camnugent/california-
grow
xample, it can be taken housing-prices.
too much (c.f. Appendix A).
4
5. Acknowledgments
We are grateful, for their scientific and technical help, to: Arthur Choi, Andy Shih, Norbert
Manthey, Maximilian Schleich, and Adnan Darwiche. Part of this work was funded by ANID -
Millennium Science Initiative Program - Code ICN17002; and “Centro Nacional de Inteligencia
Artificial” CENIA, FB210017 (Financiamiento Basal para Centros Científicos y Tecnológicos
de Excelencia de ANID), Chile. Both CENIA and SKEMA Canada funded Jorge León’s visit to
Montreal. L. Bertossi is a Professor Emeritus at Carleton University, Ottawa, Canada; and a
Senior Fellow of the Universidad Adolfo Ibáñez (UAI), Chile.
References
[1] Arenas, M., Barceló, P., Bertossi, L. and Monet, M. On the Complexity of SHAP-Score-Based
Explanations: Tractability via Knowledge Compilation and Non-Approximability Results.
Journal of Machine Learning Research, 2023, 24(63):1-58. Extended version of [2].
[2] Arenas, M., Barceló, P., Bertossi, L. and Monet, M. The Tractability of SHAP-Score-Based
Explanations for Classification over Deterministic and Decomposable Boolean Circuits. In
Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021, 6670–6678.
[3] Bertossi, L., Li, J., Schleich, M., Suciu, D. and Vagena, Z. Causality-Based Explanation
of Classification Outcomes. In Proceedings of the 4th International Workshop on "Data
Management for End-to-End Machine Learning" (DEEM) at ACM SIGMOD/PODS, 2020, 1–10.
Posted as ArXiv paper 2003.06868.
[4] Bertossi, L. and Leon, J. E. Opening Up the Neural Network Classifier for Shap Score
Computation. 2023, ArXiv paper 2303.06516.
[5] Bryant, R. E. Graph-Based Algorithms for Boolean Function Manipulation. IEEE Transac-
tions on Computers, 1986, C-35(8):677–691.
[6] Darwiche, A. and Hirth, A. On The Reasons Behind Decisions. In Proceedings of the 24th
European Conference on Artificial Intelligence, 2020, 712–720.
[7] Darwiche, A. and Marquis, P. A Knowledge Compilation Map. Journal of Artificial
Intelligence Research, 2002, 17(1):229–264.
[8] Darwiche, A. On the Tractable Counting of Theory Models and its Application to Truth
Maintenance and Belief Revision. Journal of Applied Non-Classical Logics, 2011, 11(1-2):11–
34.
[9] Darwiche, A. SDD: A New Canonical Representation of Propositional Knowledge Bases.
In Proceedings of the 22th International Joint Conference on Artificial Intelligence (IJCAI-11),
2011, 819–826.
[10] Lundberg, S. M. and Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In
Proceedings of the 31st International Conference on Neural Information Processing Systems,
2017, 4768–4777. ArXiv paper 1705.07874.
[11] Lundberg, S., Erion, G., Chen, H., DeGrave, A., Prutkin, J., Nair, B., Katz, R., Himmelfarb, J.,
Bansal, N. and Lee, S.-I. From Local Explanations to Global Understanding with Explainable
AI for Trees. Nature Machine Intelligence,2020, 2(1):56–67. ArXiv paper 1905.04610.
[12] Narodytska, N., Kasiviswanathan, S., Ryzhyk, L., Sagiv, M. and Walsh, T. Verifying
Properties of Binarized Deep Neural Networks. In Proceedings of the 32nd AAAI Conference
on Artificial Intelligence, 2018, 6615–6624.
[13] Roth, A. E. The Shapley Value: Essays in Honor of Lloyd S. Shapley. Cambridge University
Press, 1988.
[14] Rudin, C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions
and Use Interpretable Models Instead. Nature Machine Intelligence, 2019, 1:206–215. ArXiv
paper 1811.10154.
[15] Shapley, L. S. A Value for n-Person Games. In Contributions to the Theory of Games (AM-28),
1953, vol. 2, 307–318.
[16] Shi, W., Shih, A., Darwiche, A. and Choi, A. On Tractable Representations of Binary Neural
Networks. In Proceedings of the 17th International Conference on Principles of Knowledge
Representation and Reasoning, 2020, 882–892. ArXiv paper 2004.02082.
[17] Van den Broeck, G. and Darwiche, A. On the Role of Canonicity in Knowledge Compilation.
In Proceedings of the 29th AAAI Conference on Artificial Intelligence, 2015, 1641–1648.
[18] Van den Broeck, G., Lykov, A., Schleich, M. and Suciu, D. On the Tractability of SHAP
Explanations. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021,
6505–6513.