Probabilistic Abstract Interpretation on Neural Networks
                         via Grids Approximation
                         Zhuofan Zhang1 , Herbert Wiklicky2
                         1
                             Imperial College London, 180 Queen’s Gate, South Kensington, London SW7 2AZ
                         2
                             Imperial College London, 180 Queen’s Gate, South Kensington, London SW7 2AZ


                                       Abstract
                                        Probabilistic abstract interpretation is a theory used to extract particular properties of a computer program when
                                        it is infeasible to test every single inputs. In this paper we apply the theory on neural networks for the same
                                        purpose: to analyse density distribution flow of all possible inputs of a neural network when a network has
                                        uncountably many or countable but infinitely many inputs. We show how this theoretical framework works in
                                        neural networks and then discuss different abstract domains and corresponding Moore-Penrose pseudo-inverses
                                        together with abstract transformers used in the framework. We also present experimental examples to show how
                                        this framework helps to analyse real world problems.

                                        Keywords
                                        explainable AI, probabilistic abstract interpretation, neural network


                         1. Introduction
                         In recent years, deep neural networks has been regarded as an essential technique for machine learning
                         since the work of deep learning is being progressed [1, 2] and have been playing in a big role in
                         many industries in real world. However, a major problem of it is that it is difficult to interpret how
                         neural networks deal with information and derive their decisions, especially when layers are getting
                         deeper. Interpreting neural networks is important because in some application areas cost of mistakes is
                         intolerable. Our approach aims at making use of the theory of probabilistic abstract interpretation to
                         analyse density distribution flow of possible inputs of a neural network and to make sure the system is
                         safe and under controlled.
                         Research on interpreting neural networks has been carried on for years. One approach to do is
                         through rule extraction algorithms [3]. Activation Maximisation is another approach to interpret model
                         prediction by only looking at its relationships with input, disregarding the inner process [4]. A further
                         development of the technique is to add a decoder on it [5]. In the same year Bazen et al. used Taylor
                         decomposition explain model’s decision by decomposing output as a sum of relevance scores [6], and
                         Landecker et al. decomposed the model prediction by back-propagating relevance in reverse direction,
                         progressively redistributing the prediction score until the input is reached [7]. Bach et al. further
                         introduced the Layer-wise Relevance Propagation explanation framework [8]. Another measurement is
                         to use sensitivity analysis to quantify the importance of each input variable [9]. In 2015 Zilke proposed
                         DeepRED, by extending a decompositional algorithm CRED [10]. Jacobsson [12] and Wang [13] have
                         proposed algorithms of understanding recurrent neural networks. In 2018 Zhang et al. and Zhou et
                         al. respectively proposed methods to interpret convolutional neural networks [14, 15, 16]. Zeiler et al.
                         instead used approach of layer activation visualisation with algorithm of Deconvnets to understand
                         CNN [17]. Bau et al. proposed Network Dissection framework which well quantifies the interpretability
                         of latent representations of CNN [18]. In the mean time, the attacks and defenses from adversarial
                         examples to neural networks are frequently studied [19].
                         Program analysis, on the other hand, is a research area which focuses on using mathematical theory
                         and formal methods to automatically analyse the behavior of a computer program regarding a property

                          XAI.it - 5th Italian Workshop on Explainable Artificial Intelligence, co-located with the 23rd International Conference of the Italian
                          Association for Artificial Intelligence, Bolzano, Italy, November 25-28, 2024 [22]
                          $ zhuofan.zhang13@imperial.ac.uk (Z. Zhang); h.wiklicky@imperial.ac.uk (H. Wiklicky)
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings

                                                                                                               1
Zhuofan Zhang et al. CEUR Workshop Proceedings                                                           1–14


                                                    𝐴
                                       𝐶                              𝐶#
                                                    𝐺
                                   𝑓                                   𝑓#
                                                    𝐴′
                                       𝐷                              𝐷#
                                                    𝐺′
Figure 1: Probabilistic abstract interpretation with Moore-Penrose pseudo-inverse pairs 𝐴 and 𝐺. The left hand
       𝑓                                                         𝑓#
       → 𝐷 is the concrete domain and the right hand side 𝐶 # −−→ 𝐷# is the abstract domain.
side 𝐶 −


such as safety, correctness, robustness and liveness [20]. In particular, abstract interpretation is a theory
of approximating semantics of a computer program based on order theory, especially lattices [21, 23].
In recent years, probabilistic program analysis has been introduced to analyze quantitative aspect
of computer programs. Probabilistic abstract interpretation correspondingly is the framework to lift
classical abstract interpretation to the probabilistic setting [24, 25, 26, 27].
In 2018 SRILAB proposed 𝐴𝐼 2 analyzer that uses abstract interpretation to over-approximate a range of
concrete values into an abstract domain and checks whether this abstract domain holds the expected
result, and which will prove every single input over-approximated by this abstract domain also holds
this result [28]. Later [29] proposed a new abstract domain which combines floating point polyhedra
with intervals. Its performance was compared to Reluplex which is a former property verification
technique extending the simplex algorithm to support ReLU constraints [30]. Abstract interpretation is
also bridged with gradient-based optimization and can be applied on training neural networks [31].
In this paper, we take theory of probabilistic abstract interpretation to analyse density distribution
flow of all possible inputs of a neural network, when the network has uncountably many or countable
but infinitely many possible inputs. We also see that in particular situations probabilistic abstract
interpretation has stronger features than classical ones. Section 2 gives a brief background of probabilistic
abstract interpretation theory. In Section 3 we introduce how to apply the theory on neural networks,
and discuss different choices of abstract domains for different purposes, together with corresponding
Moore-Penrose pseudo inverses and abstract domains. In Section 4 we show in experiments how the
framework helps in real world problem. Section 5 gives a conclusion and future works.


2. Background: Probabilistic Abstract Interpretation Theory
In this section we give a brief introduction of probabilistic abstract interpretation theory originally
introduced in program analysis. Probabilistic abstract interpretation is considered as a quantitative
version of the classical abstract interpretation, commonly to give a not safe but a more average case
scenario, while the classical framework keeps safe but might be not precise enough and therefore
addressing a worst case scenario [20, 21, 25, 33, 26]. In probabilistic framework, instead of forming a
Galois connection in which classical framework does, we take a Moore-Penrose pseudo-inverse.
Definition 1. Let 𝐶 and 𝐶 # be two finite-dimensional vector (Hilbert) spaces and 𝐴 : 𝐶 → 𝐶 # a linear
map. Then the linear map 𝐺 : 𝐶 # → 𝐶 is the Moore-Penrose pseudo-inverse of A iff (i) 𝐴 ∘ 𝐺 = 𝑃𝐴 (ii)
𝐺 ∘ 𝐴 = 𝑃𝐺 , where 𝑃𝐴 and 𝑃𝐺 denote orthogonal projections onto the ranges of 𝐴 and 𝐺.
As shown in Figure 1, 𝑓 is the original function in concrete domain, which can be a step in a computer
                                                    →
                                                    −
program. Based on 𝑓 , we further consider function 𝑓 in probability domain and function 𝑓 # in abstract
domain respectively. Given in concrete domain
                                                𝑓 :𝑅→𝑅                                                     (1)
             →
             −
we lift 𝑓 to 𝑓 which is a mapping from a distribution to a distribution in probabilistic domain
                                        →
                                        −
                                        𝑓 : 𝒱(𝑅) → 𝒱(𝑅)                                                    (2)


                                                      2
Zhuofan Zhang et al. CEUR Workshop Proceedings                                                        1–14


where 𝒱(𝑅) is the vector space of 𝑅, and from probabilistic abstract interpretation framework in Figure
1, we can formalise 𝑓 # in abstract domain as

                                                𝑓 # = 𝐴′ 𝑓 𝐺                                            (3)

where we have Moore-Penrose pseudo-inverse as connections between concrete and abstract domains.
𝑓 # is also called the abstract transformer. Defining an appropriate abstract domain and formalising
an appropriate abstract transformer is crucial to performance of this analysis framework.

Proposition 1. An abstract transformer of a composition of functions is a composition of abstract trans-
formers of each function, that is

                                  if 𝑓 = 𝑓 ′ ∘ 𝑓 ′′ , then 𝑓 # = 𝑓 ′# ∘ 𝑓 ′′#                           (4)


3. Probabilistic Abstract Interpretation on Neural Networks
In this section we discuss how to apply theory of probabilistic abstract interpretation on neural networks.
In principle, any neural network is a non-linear function which can be simulated by a computer program,
thus any analysis framework working on a computer program can work on a neural network.

3.1. Probabilistic Abstract Interpretation on A General Layer
A typical layer in a feed-forward neural network can be expressed as an affine transformation followed
by a non-linear function, or more specifically can be for example in form of

                                       𝑥𝑡+1 = 𝑅𝑒𝐿𝑈 (𝑊 𝑡 𝑥𝑡 + 𝑏𝑡 )                                       (5)

where 𝑡 is the layer number, 𝑥𝑡 is the input of the layer, 𝑥𝑡+1 is the output, 𝑊 𝑡 and 𝑏𝑡 are the weights
of affine transformation, and 𝑅𝑒𝐿𝑈 is one example of choices of activation functions. We use these
notations instead of 𝑓 to avoid ambiguity from 𝑓 in abstract interpretation. To construct a probabilistic
                             →
                             −
abstract interpretation, 𝑓 , 𝑓 and 𝑓 # need to be considered. In this case 𝑓 : R𝑚 → R𝑛 is the forward
                                                                                   →
                                                                                   −
propagation function. We lift 𝑓 : R𝑚 → R𝑛 up into probabilistic domain to be 𝑓 : 𝒱(R𝑚 ) → 𝒱(R𝑛 )
which is
                                              −−−−→ − →            →
                                                                   −
                                   𝑑(𝑥𝑡+1 ) = 𝑅𝑒𝐿𝑈 (𝑊 𝑡 𝑑(𝑥𝑡 ) + 𝑏 𝑡 )                                (6)
                                                       →
                                                       −
where 𝑑(𝑥) is probability distribution of 𝑥. Then from 𝑓 we can construct the abstract transformer 𝑓 #
which is
                               𝑑# (𝑥𝑡+1 ) = 𝑅𝑒𝐿𝑈 # (𝑊 𝑡# 𝑑# (𝑥𝑡 ) + 𝑏𝑡# )                             (7)
In neural networks it is very common for a layer to be a composition of multiple functions. A typical
case will be an affine transformation followed by a non-linear activation function. To obtain the abstract
transformer, by definition of Moore-Penrose pseudo-inverse we can have abstract transformer for Aff𝑡
                                                        −−→
                                             Aff𝑡# = 𝐴′ Aff𝑡 𝐺                                          (8)

and abstract transformer for 𝑅𝑒𝐿𝑈
                                                      −−−−→
                                         𝑅𝑒𝐿𝑈 # = 𝐴′′ 𝑅𝑒𝐿𝑈 𝐺′                                           (9)

and by Proposition 1 we can have

                                  (Aff𝑡 ∘ 𝑅𝑒𝐿𝑈 )# = Aff𝑡# ∘ 𝑅𝑒𝐿𝑈 #                                    (10)

In probabilistic abstract interpretation, each possible value a neuron has is with a corresponding
probability. A prior distribution initialisation is needed to be given in input space. We use tensor


                                                      3
Zhuofan Zhang et al. CEUR Workshop Proceedings                                                           1–14


                                                        ⊤

                                                R+       0     R−

                                                        ⊥

Figure 2: lattice of abstract domain


product to combine all probability distributions for every single neurons in input space to form an
overall distribution representing the input layer. That is, for any two independent neurons,

                                       𝒱(𝑋11 × 𝑋21 ) = 𝒱(𝑋11 ) ⊗ 𝒱(𝑋21 )                                  (11)

where ⊗ is notation of tensor product. Therefore, the abstract transformer for the first layer 𝑓 1# is

                  (Aff𝑡 ∘ 𝑅𝑒𝐿𝑈 )# (𝑥1 ) = 𝑅𝑒𝐿𝑈 # (𝑊 1# (𝑑(𝑥11 ) ⊗ ... ⊗ 𝑑(𝑥1𝑛 )) + 𝑏1# )                  (12)

3.2. Probabilistic Abstract Interpretation on Basic MLP
Given a simplest MLP, with two layers and five neurons, with parameters shown below. We use
notations 𝑥𝑡 to represent neuron values in layer 𝑡. Let inputs to be real values such that 𝑥11 , 𝑥12 ∈ [−3, 3],
                                  (︂     )︂        (︂ )︂          (︂ )︂
                                    1 1              0              1
                           𝑊 =1               1
                                            ,𝑏 =            2
                                                         ,𝑊 =           , 𝑏2 = 0                          (13)
                                    1 1              0              1

and we set both activation functions as ReLU. We take the abstract domain to be {R+ , 0, R− }, shown
in Fig. 2. In concrete domain we have 𝑓 : R2 → R2

                                   (𝑥21 , 𝑥22 ) = 𝑅𝑒𝐿𝑈 (𝑊 1 (𝑥11 , 𝑥12 ) + 𝑏1 )                           (14)
            →
            −
and we have 𝑓 : 𝒱(R2 ) → 𝒱(R2 )
                                           −−−−→ −−→
                                  𝑑(𝑥2 ) = 𝑅𝑒𝐿𝑈 (Aff1 (𝑑(𝑥11 ) ⊗ 𝑑(𝑥12 )))                                (15)

In abstract domain, suppose we first only discuss the inputs in integers 𝑥11 , 𝑥12 ∈ N such that

                                        𝑥11 , 𝑥12 ∈ [−3, −2, −1, 0, 1, 2, 3]                              (16)

and initialise input space with uniform distribution, we have

                                       𝑑(𝑥11 ) = 𝑑(𝑥12 ) = (3/7, 1/7, 3/7)                                (17)

and we can have

                      𝑑(𝑥2 ) = 𝑅𝑒𝐿𝑈 # (Aff1# ((3/7, 1/7, 3/7) ⊗ (3/7, 1/7, 3/7))))
                               (︀                                )︀                                       (18)
                             = 0 0 0 0 4/7 0 0 0 3/7

That is, after first layer, we will have (𝑥21 = 0, 𝑥22 = 0) with probability 4/7, and (𝑥21 ∈ R+ , 𝑥22 ∈ R+ )
with probability 3/7, and zero probability to have other outcomes. To be more specific, we have
                               ⎛                                          ⎞
                                 1 0 0 0 0 0 0 0 0 ··· 0
                        −−→1 ⎜ ⎜0 1 0 0 0 0 0 1 0 · · · 0⎟
                                                                          ⎟
                        Aff = ⎜ . . . . . . . . . .                     .   ∈ R13×49                    (19)
                               ⎝ .. .. .. .. .. .. .. .. ..        . . .. ⎟
                                                                          ⎠
                                 0 0 0 0 0 0 0 0 0 ··· 1


                                                         4
Zhuofan Zhang et al. CEUR Workshop Proceedings                                                            1–14


where in a row each entry represents a possible pair of (𝑥11 , 𝑥12 ), and in a column each entry represents
                                                    −−→
entry of 𝑥11 + 𝑥12 . For example, in the 1st row of Aff1 , only first entry is 1 and all others are zero. That
is, there exists and only exists a possible pair (−3, −3) that will result in 𝑥11 + 𝑥12 = −6 which the 1st
row represents. We also have
                            ⎛                                                ⎞
                              1 1 1 1 1 1 0 0 0 0 0 0 0
                            ⎜0 0 0 0 0 0 0 0 0 0 0 0 0⎟
                            ⎜ .. .. .. .. .. .. .. .. .. .. .. .. .. ⎟
                            ⎜                                                ⎟
                            ⎜. . . . . . . . . . . . .⎟
                         ′                                                          9×13
                                                                                                           (20)
                            ⎜                                                ⎟
                       𝐴 =⎜ ⎜0 0 0 0 0 0 1 0 0 0 0 0 0⎟ ∈ R
                                                                             ⎟
                            ⎜ .. .. .. .. .. .. .. .. .. .. .. .. .. ⎟
                            ⎜. . . . . . . . . . . . .⎟
                            ⎜                                                ⎟
                            ⎝0 0 0 0 0 0 0 0 0 0 0 0 0⎠
                              0 0 0 0 0 0 0 1 1 1 1 1 1
which in output space maps a probability distribution vector to an abstract element vector, and
                              ⎛1                               ⎞
                                9 0 0 0 0 0 0 0 0
                              ⎜ 1 0 0 0 0 0 0 0 0⎟
                              ⎜9
                         𝐺 = ⎜ . . . . . . . . . ⎟ ∈ R49×9                                                (21)
                                                               ⎟
                              ⎝ .. .. .. .. .. .. .. .. .. ⎠
                                                                    1
                                     0 0 0 0 0 0 0 0                9

which in input space maps an abstract element vector to a probability distribution vector. As part of
information is lost, we can only map back by uniform distribution. For example, there are nine possible
pairs of (𝑥11 , 𝑥12 ) with both values negative, so we let the total probability of (−, −) divided by nine to
be probability of each possible pair. Then, recall definition of probabilistic abstract interpretation, we
have                                                        −−→
                                               Aff1# = 𝐴′ Aff1 𝐺                                         (22)
and similarly we can have
                                                       −−−−→
                                          𝑅𝑒𝐿𝑈 # = 𝐴′′ 𝑅𝑒𝐿𝑈 𝐺′                                            (23)
which switches probabilities in positions of negativeness into positions of zeros. In this case, it is easy
to obtain 𝑅𝑒𝐿𝑈 #
                                       0 0 0 0 0 0 0 0 0
                                    ⎛                               ⎞
                                    ⎜0 0 0 0 0 0 0 0 0⎟
                                    ⎜                               ⎟
                                    ⎜0 0 0 0 0 0 0 0 0⎟
                                    ⎜                               ⎟
                                    ⎜0 0 0 0 0 0 0 0 0⎟
                       𝑅𝑒𝐿𝑈 # = ⎜                                   ⎟ ∈ R9×9                            (24)
                                    ⎜                               ⎟
                                    ⎜  1  1   0  1   1  0   0  0   0⎟
                                    ⎜0 0 1 0 0 1 0 0 0⎟
                                    ⎜                               ⎟
                                    ⎜0 0 0 0 0 0 0 0 0⎟
                                    ⎜                               ⎟
                                    ⎝0 0 0 0 0 0 1 1 0⎠
                                       0 0 0 0 0 0 0 0 1

Then to make the case more general if we expand the input domain into reals 𝑥11 , 𝑥12 ∈ R such that
                                              𝑥11 , 𝑥12 ∈ [−3, 3]                                         (25)
we can have
                                      𝑑(𝑥11 ) = 𝑑(𝑥12 ) = (1/2, 0, 1/2)                                   (26)
and                                      −−→1
                                         Aff ∈ lim        lim R𝑁 ×𝑀                                       (27)
                                                𝑀 →∞ 𝑁 →∞
which switches positions of pairs after linear transformation, and as in Real space 𝑀 and 𝑁 are both
very large numbers that tend to infinity, and 𝑀 > 𝑁 . Also we have
                                             𝐴′ ∈ lim R9×𝑁                                                (28)
                                                   𝑁 →∞


                                                      5
Zhuofan Zhang et al. CEUR Workshop Proceedings                                                         1–14


which classifies concrete points in Real space into nine categories of abstract domain, and

                                            𝐺 ∈ lim R𝑀 ×9                                              (29)
                                                   𝑀 →∞

which maps nine categories of abstract domain back to concrete domain. Hence we have
                                                       −−→
                                            Aff1# = 𝐴′ Aff1 𝐺                                          (30)

and
                                                      −−−−→
                                         𝑅𝑒𝐿𝑈 # = 𝐴′′ 𝑅𝑒𝐿𝑈 𝐺′                                          (31)
Still, in this particular case

                                    0      0   0   0       0   0   0   0   0
                                   ⎛                                        ⎞
                                   ⎜0      0   0   0       0   0   0   0   0⎟
                                   ⎜                                        ⎟
                                   ⎜0      0   0   0       0   0   0   0   0⎟
                                   ⎜                                        ⎟
                                   ⎜0      0   0   0       0   0   0   0   0⎟
                                #                                               9×9
                                                                                                       (32)
                                   ⎜                                        ⎟
                                   ⎜1
                            𝑅𝑒𝐿𝑈 = ⎜       1   0   1       1   0   0   0    ⎟∈R
                                                                           0⎟
                                   ⎜0      0   1   0       0   1   0   0   0⎟
                                                                            ⎟
                                   ⎜
                                   ⎜0      0   0   0       0   0   0   0   0⎟
                                   ⎜                                        ⎟
                                   ⎝0      0   0   0       0   0   1   1   0⎠
                                    0      0   0   0       0   0   0   0   1

Therefore

                         𝑑(𝑥2 ) = 𝑅𝑒𝐿𝑈 # (Aff1# ((1/2, 0, 1/2) ⊗ (1/2, 0, 1/2))))
                                  (︀                                )︀                                 (33)
                                = 0 0 0 0 1/2 0 0 0 1/2

That is, after first layer, we will have (𝑥21 = 0, 𝑥22 = 0) with probability 1/2, and (𝑥21 ∈ R+ , 𝑥22 ∈ R+ )
with probability 1/2, and probability 0 to have other outcomes.

3.3. Probabilistic Abstract Interpretation on Convolutional Layer
Convolutional neural network in principle is also a special case of fully-connected layers, except
that many weights are zeros. We illustrate a simple example of applying abstract interpretation on
convolutional layer. Let the input space 𝑋 ∈ R4×4 , the convolution filter 𝑔 ∈ R2×2 , and the output
space then is 𝑌 ∈ R(4−2+1)×(4−2+1) = R3×3 . As a well-trained model, the weights of 𝑔 is already
given and fixed. Let                          (︂         )︂
                                                 𝑔   𝑔
                                          𝑔 = 11 12                                              (34)
                                                 𝑔21 𝑔22
To express the convolution computation in formal mathematical expression, we can have

                                          𝑦 = 𝑅𝑒𝐿𝑈 (𝑊 𝑥 + 𝑏)                                           (35)
              ⎛ 𝑔11 𝑔12 0   0 𝑔21 𝑔22 0 0 0 0 0 0 0 0 0 0
                                                                            ⎞
                  0 𝑔11 𝑔12 0 0 𝑔21 𝑔22 0 0 0 0 0 0 0 0 0
          ⎜       0 0 𝑔11 𝑔12 0 0 𝑔21 𝑔22 0 0 0 0 0 0 0 0 ⎟
          ⎜       0 0 0 0 𝑔11 𝑔12 0 0 𝑔21 𝑔22 0 0 0 0 0 0 ⎟
where 𝑊 = ⎜       0 0 0 0 0 𝑔11 𝑔12 0 0 𝑔21 𝑔22 0 0 0 0 0 ⎟
          ⎜                                               ⎟
          ⎜       0 0 0 0 0 0 𝑔11 𝑔12 0 0 𝑔21 𝑔22 0 0 0 0 ⎟
          ⎝       0 0 0 0 0 0 0 0 𝑔11 𝑔12 0 0 𝑔21 𝑔22 0 0 ⎠
                  0 0 0 0 0 0 0 0 0 𝑔11 𝑔12 0 0 𝑔21 𝑔22 0
                  0 0 0 0 0 0 0 0 0 0 𝑔11 𝑔12 0 0 𝑔21 𝑔22
  and we choose ReLU function to be activation function as it is the most commonly used one for
Convolutional layers. For simplicty we let 𝑏 be zero vector, and let
                                                 (︂      )︂
                                                   1 −1
                                            𝑔=                                             (36)
                                                   1 −1


                                                       6
Zhuofan Zhang et al. CEUR Workshop Proceedings                                                       1–14


We again concern the positiveness/negativeness, which in computer vision application is the darkness
of pixels. The choice of convolution filter is typically a simple version of border detection filter.
Following the ReLU function, we can let abstract domain be {R− , 0, R+ }. In concrete domain we have
𝑓 : R16 → R9
                             (𝑦1 , ..., 𝑦9 ) = 𝑅𝑒𝐿𝑈 (𝑊 (𝑥1 , ..., 𝑥16 ) + 𝑏)                     (37)
suppose we initialise the input space 𝑋 with uniform distribution over R, that is, 𝑑(𝑥𝑖 ) = (1/2, 0, 1/2).
                                         →
                                         −
To lift up to probability domain we have 𝑓 : 𝒱(R16 ) → 𝒱(R9 )
                                      −−−−→ −
                                            →
                               𝑑(𝑦) = 𝑅𝑒𝐿𝑈 (Aff(𝑑(𝑥1 ) ⊗ ... ⊗ 𝑑(𝑥16 )))                              (38)

Concretely we can see after convolution each entry has probability 1/2 to be negative and 1/2 to be
positive. Therefore, applying ReLU function

                                            𝑑(𝑦𝑖 ) = (0, 1/2, 1/2)                                    (39)

the output will have probability 1/2 to be 0 and probability 1/2 to be positive consequently, as ReLU will
transform all negative values into zero.

3.4. Zonotopes Approximation as Abstract Domain
In 2018, Gehr et al introduced 𝐴𝐼 2 , an analyzer which can automatically prove properties, especially
safety and robustness, for convolutional neural networks based on classical abstract interpretation [28].
𝐴𝐼 2 has found a general way to define abstract domain and abstract transformer of different types of
layers, including fully-connected layer, Convolutional layer and max-pooling layer. It defines suitable
abstract domains and represents neural network layers as CAT functions, and hence defines abstract
transformers which can be applied and computed on abstract elements in a very general way. The
formal definitions and details can be found in [28].
It uses a shape of Zonotope as an abstract domain. In probabilistic framework, we can also follow this
abstract domain. We take the following example in [28] to illustrate. Given function 𝑓 : R2 → R2
                                                  (︂      )︂
                                                    2 −1
                                          𝑓 (𝑥) =            𝑥                                       (40)
                                                    0 1

with input space X expressed by zonotope 𝑧 1 : [−1, 1]3 → R2 :

                        𝑧 1 (𝜖1 , 𝜖2 , 𝜖3 ) = (1 + 0.5𝜖1 + 0.5𝜖2 , 2 + 0.5𝜖1 + 0.5𝜖3 )                (41)

followed by a ReLU transformation, shown in Figure 3 and 4. We illustrate this zonotope approximation
first in integer space and then generalise it into real space.

                                3                                    3
                                2                                    2
                                1                                    1

                              −1     1 2                      −2 −1       1 2

Figure 3: abstract elements through layer


3.4.1. Zonotopes Approximation in Integer Space
We first define abstract transformer for probabilistic framework, 𝑓 # , together with Moore-Penrose
pseudo-inverse functions 𝐴 and 𝐺 in integer space. In concrete domain, we have overall 4 input points
𝑋 = {(0, 1), (1, 1), (1, 3), (2, 2)} and we lift it up into probability space with uniform distribution


                                                      7
Zhuofan Zhang et al. CEUR Workshop Proceedings                                                              1–14


                                    3                                             3
                                    2                                             2
                                    1                                             1

                              −2 −1      1 2                         −2 −1                1 2

Figure 4: ReLU transformation


to get 𝑑(𝑋) = {1/4, 1/4, 1/4, 1/4}. In abstract domain, the Integer space contained in 𝑧 1 is 𝐼 2 =
{(0, 1), (0, 2), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3)}. As we abstract the input probability space into abstract
domain, we have                                   ⎛             ⎞            ⎛1⎞
                                                    1 0 0 0                     4
                                                  ⎜0 0 0 0⎟ ⎛ 1 ⎞ ⎜ 0 ⎟
                                                  ⎜             ⎟            ⎜1⎟
                                                  ⎜0 1 0 0⎟ 41               ⎜ ⎟
                                                                ⎟ ⎜ ⎟ ⎜4⎟
                                                                                                              (42)
                                                  ⎜
                                                                     4
                                  𝐴(𝑑(𝑋)) = ⎜     ⎜0 0 0 0⎟ ⎝ 1 ⎠ = ⎜ 01 ⎟
                                                                ⎟⎜ ⎟ ⎜ ⎟
                                                  ⎜0 0 1 0⎟ 4                ⎜ ⎟
                                                  ⎜             ⎟ 1          ⎜4⎟
                                                  ⎝0 0 0 1⎠ 4                ⎝1⎠
                                                                                4
                                                    0 0 0 0                    0
and for opposite direction we have
                                                                                  ⎛1⎞
                                                                                      4
                                         ⎛1     1   1   1    1       1       1 ⎞ ⎜0⎟   ⎛1⎞
                                                                                  ⎜ ⎟
                                            4   4   4   4    4       4       4     1⎟    4
                                    ⎜1          1   1   1    1       1       1⎟ ⎜4⎟    ⎜1⎟
                                                                                 ⎜
                       𝐺(𝐴(𝑑(𝑋))) = ⎜
                                    ⎝1
                                      4         4
                                                1
                                                    4
                                                    1
                                                        4
                                                        1
                                                             4
                                                             1
                                                                     4
                                                                     1
                                                                             4 ⎟ ⎜0⎟ = ⎜ 4 ⎟
                                                                             1⎠ ⎜ ⎟    ⎝1⎠                   (43)
                                            4   4   4   4    4       4       4   ⎜1⎟     4
                                            1   1   1   1    1       1       1   ⎜4⎟     1
                                            4   4   4   4    4       4       4   ⎝1⎠     4
                                                                                   4
                                                                                      0

particularly we can also see during linear transformation the distribution over input space does not
change, where each point in X corresponds to a point in f(X) and vice versa. Therefore 𝑓 # : 𝒱(R2 ) →
𝒱(R2 )
                                                 →
                                                 −
                                        𝑓 # = 𝐴′ 𝑓 𝐺 = 𝐴′ 𝐺                                        (44)
Similarly the abstract transformer for ReLU is 𝑅𝑒𝐿𝑈 # : 𝒱(R2 ) → 𝒱(R2 )
                                  −−−−→
                     𝑅𝑒𝐿𝑈 # = 𝐴′′ 𝑅𝑒𝐿𝑈 𝐺′ = 𝐴′′ 𝐺′
                       ⎛            ⎞
                        0 0 0 0
                       ⎜1 0 0 0⎟
                       ⎜0 0 0 0⎟ 1 1 1 1                         1       1    1       1    1    1   1⎞
                       ⎜            ⎟⎛
                                    ⎟ 4 4 4 4                    4       4    4       4    4    4   4
                       ⎜0 0 1 0⎟ ⎜ 1 1 1 1                       1       1    1       1    1    1   1⎟       (45)
                       ⎜
                     =⎜⎜            ⎟ ⎜ 41 41 41 41              4
                                                                 1
                                                                         4
                                                                         1
                                                                              4
                                                                              1
                                                                                      4
                                                                                      1
                                                                                           4
                                                                                           1
                                                                                                4
                                                                                                1
                                                                                                    4⎟
                                                                                                    1⎠
                       ⎜0 1 0 0⎟ 4 4 4 4
                                    ⎟⎝
                                                                 4       4    4       4    4    4   4
                       ⎜0 0 0 0⎟ 1 1 1 1                         1       1    1       1    1    1   1
                       ⎜            ⎟ 4 4 4 4                    4       4    4       4    4    4   4
                       ⎝0 0 0 0⎠
                        0 0 0 1

3.4.2. Zonotopes Approximation in Real Space
The Integer space may be a special scenario. More generally we have to expand into Real space. Let’s
first look at the linear transformation from 𝑧 1 to 𝑧 2 . Suppose in concrete domain we have N different
points in arbitrary positions 𝑋 = {𝑥1 , 𝑥2 , ..., 𝑥𝑁 }, and 𝑋 is over-approximated by 𝑧 1 . If we lift 𝑋 up
into probability space with uniform distribution we have 𝑑(𝑋) = { 𝑁1 , 𝑁1 , ..., 𝑁1 }. In abstract domain,


                                                        8
Zhuofan Zhang et al. CEUR Workshop Proceedings                                                  1–14


we know in mathematics there are infinitely many points in Real space contained in 𝑧 1 , but we can
regard this as a very large number and let it be lim𝑀 →∞ 𝑀 . Then, we can have

                                               𝐴 ∈ lim R𝑀 ×𝑁                                     (46)
                                                     𝑀 →∞

with an entry 1 in each column in corresponding position and other entries are all zeros. And

                                               𝐺 ∈ lim R𝑁 ×𝑀                                     (47)
                                                     𝑀 →∞

with all entries are 𝑁1 . Now we can let the same very large number lim𝑀 →∞ 𝑀 to be the number of all
real points contained in 𝑧 2 , so
                                              𝐴′ = 𝐴, 𝐺′ = 𝐺                                    (48)
                                                →
                                                −
As it is linear transformation, again we have 𝑓 : 𝒱(R2 ) → 𝒱(R2 )
                                                     →
                                                     −
                                                     𝑓 (𝑑) = 𝑑                                   (49)

and hence 𝑓 # : 𝒱(R2 ) → 𝒱(R2 )
                                                  →
                                                  −
                                          𝑓 # = 𝐴′ 𝑓 𝐺 = 𝐴′ 𝐺                                    (50)
In ReLU transformation, we have to consider 𝑅𝑒𝐿𝑈 (𝑓 (𝑋)). We claim that if ∀𝑥𝑖 , 𝑥𝑗 ∈ 𝑅𝑒𝐿𝑈 (𝑓 (𝑋))
                                    −−−−→
s.t. 𝑥𝑖1 ̸= 𝑥𝑗1 ∧ 𝑥𝑖2 ̸= 𝑥𝑗2 , then 𝑅𝑒𝐿𝑈 : 𝒱(R2 ) → 𝒱(R2 )
                                                 −−−−→
                                                 𝑅𝑒𝐿𝑈 (𝑑) = 𝑑                                    (51)

and
                                               𝐴′′ = 𝐴′ , 𝐺′′ = 𝐺′                               (52)
Otherwise, suppose if ∃ and only ∃ a pair 𝑥𝑝 , 𝑥𝑞 ∈ 𝑅𝑒𝐿𝑈 (𝑓 (𝑋))(𝑝 < 𝑞) s.t. 𝑥𝑝2 = 𝑥𝑞2 , then
                               ⎛                                ⎞
                                 1 0 . . . . . . . .
                               ⎜0 1 . . . . . . . . ⎟
                               ⎜                                ⎟
                               ⎜. . . . . . . . . .⎟
                               ⎜                                ⎟
                     −−−−→ ⎜     . . . . . . . . . .⎟           ⎟ ∈ R(𝑁 −1)×𝑁
                     𝑅𝑒𝐿𝑈 = ⎜  ⎜                                                                 (53)
                               ⎜ .   .  .  1    .  . 1 .   .  . ⎟
                                                                ⎟
                               ⎜. . . . . . . . . .⎟
                               ⎜                                ⎟
                               ⎝ . . . . . . . . 1 0⎠
                                  .   .    .     .    . .      .   . 0 1
                                                                                 −−−−→
which can be obtained by eliminating the 𝑞th row from identity matrix and assign 𝑅𝑒𝐿𝑈 𝑝𝑞 = 1. That is
                                              ⎛1⎞
                                                𝑁
                                                        ⎛1⎞
                                              ⎜ 𝑁1 ⎟      𝑁
                                              ⎜ ⎟ ⎜1⎟
                                              ⎜ . ⎟ ⎜𝑁 ⎟
                                              ⎜ ⎟ ⎜.⎟
                                       −−−−→ ⎜   .⎟⎟=⎜2⎟                                        (54)
                                                        ⎜ ⎟
                                       𝑅𝑒𝐿𝑈 ⎜ ⎜ . ⎟ ⎜𝑁 ⎟
                                              ⎜ ⎟ ⎜.⎟
                                              ⎜.⎟ ⎜ ⎟
                                              ⎜ ⎟ ⎝.⎠
                                              ⎝.⎠
                                                                    1
                                                       1            𝑁
                                                       𝑁

with 𝑁2 in 𝑝th entry. Then
                                          𝐴′′ ∈ lim R𝑀 ×(𝑁 −1)                                   (55)
                                                     𝑀 →∞
with an entry 1 in each column in corresponding position and other entries are all zeros. And

                                          𝐺′′ ∈ lim R(𝑁 −1)×𝑀                                    (56)
                                                     𝑀 →∞


                                                           9
Zhuofan Zhang et al. CEUR Workshop Proceedings                                                        1–14


with all entries are 𝑁 1−1 . Therefore
                                                      −−−−→
                                         𝑅𝑒𝐿𝑈 # = 𝐴′′ 𝑅𝑒𝐿𝑈 𝐺′                                          (57)

Obviously if ∃ 𝑟 exclusive pairs 𝑥𝑝 , 𝑥𝑞 ∈ 𝑅𝑒𝐿𝑈 (𝑓 (𝑋)) s.t. 𝑥𝑝2 = 𝑥𝑞2 , then
                                           −−−−→
                                           𝑅𝑒𝐿𝑈 ∈ R(𝑁 −𝑟)×𝑁                                            (58)

and if let 𝑁 → 𝑀 we have
                                              −−−−→    𝑀
                                          lim 𝑅𝑒𝐿𝑈 ∈ R( 2 )×𝑀                                          (59)
                                         𝑁 →𝑀


3.5. Grids Approximation as Abstract Domain
We see that in Real space we encounter problem of infinity - since there are infinitely many points
in any closed area in Real space, the dimensions of both input space and abstract element tends to
infinitely large, and hence we are forced to handle vectors and matrices in dimension of infinity. We
always wish to avoid this. A possible approximation is to use grids. That is, in former example, instead
of considering space R2 , we consider 𝜖Z2 , where 𝜖 gives size of grids to be chosen. For example, if
we let 𝜖 to be 0.01, then the grid size is 0.01 × 0.01, which means we will have 100 × 100 points in a
closed space bounded by a Box of size 1 × 1(regardless of points on boundary lines - we could have
101 × 101 points more specifically). In this way, each point represents the local probability of neighbour
area, which still shows probability density distribution of the space. For convenience we always let
𝑥𝑖 − 𝑎𝑥𝑖𝑠 to be one of lines of grids, which will determine position of the whole grids. Now if we
consider 𝑧 1 in 𝜖Z2 , we have in total 30301 points in this abstract element. That is, recall the notation
𝑀 used in last subsection, instead of having lim𝑀 →∞ 𝑀 points, we now have 𝑀 = 30301. Same
as the input space 𝑋 in 𝜖Z2 . No matter what 𝑋 is, 𝑁 is finite and computable. Therefore we have
𝐴 : 𝒱(0.01Z2 ) → 𝒱(0.01Z2 )
                                         𝐴 ∈ R𝑀 ×𝑁 = R30301×𝑁                                          (60)
and so on for 𝐺. As we know that any linear transformation in R2 is a bijective mapping, and
𝜖Z2 ⊂ R2 , hence any linear transformation in 𝜖Z2 is also a bijective mapping. So we can have
→
−
𝑓 : 𝒱(𝜖Z2 ) → 𝒱(𝜖Z2 )
                                         →
                                         −
                                          𝑓 (𝑑) = 𝑑                                       (61)
Similarly 𝑧 2 in 𝒱(𝜖Z2 ) contains 60401 points, so

                                         𝐴′ ∈ R𝑀 ×𝑁 = R60401×𝑁                                         (62)

and so on for 𝐺′ . Then we have 𝑓 # : 𝒱(𝜖Z2 ) → 𝒱(𝜖Z2 )
                                                →
                                                −
                                        𝑓 # = 𝐴′ 𝑓 𝐺 = 𝐴′ 𝐺                                            (63)

Similar for 𝑅𝑒𝐿𝑈 # that we can now replace lim𝑀 →∞ 𝑀 by finite and computable numbers. The
reason 𝐴𝐼 2 takes zonotopes as abstract domain is that it is concerning robustness of a neural network,
and having a sound yet precise abstract element which over-approximates input space is a good way to
prove robustness. In probabilistic abstract interpretation, we usually have a different aim. Probabilistic
abstract interpretation concerns more about density distribution of input space and its change through
propagation of a neural network. Therefore, it is more meaningful to use other abstract domains to
reach the purpose. Following grid approximation, using a more sparse grids as an abstract domain is a
fair choice. In this context let 𝑧 1 in 0.01Z2 be input space, i.e., 𝑋 = {𝑥 ∈ 𝑧 1 | 𝑥 ∈ 0.01Z2 )}. Then we
let the abstract domain to be the space in Z2 . We can choose different shapes of grids. As an instance,
we choose the squares. That is, we abstract 10,000 points in each 1 × 1 square in 0.01Z2 into one
single point in the centre of the square, and the value of probability of which is the sum of all values of
probabilities of 10,000 points. Say if we consider the coordinates 𝑥 ∈ (−3, 3) and 𝑦 ∈ (−3, 3), then

                                    𝐴 ∈ R49×490,000 , 𝐺 ∈ R490,000×49                                  (64)


                                                     10
Zhuofan Zhang et al. CEUR Workshop Proceedings                                                            1–14


              (a) Model Architecture Plot                            (b) Model Architecture Table
Figure 5: (a) A plot of model architecture of digit classifier. (b) A table of model architecture showing total
number of weights and number of weights in each layer.


and A’ and G’ are similar. A concrete element is a vector of length 490,000 and an abstract element is a
vector of length 49, with each entry being value of probabilities. We can see that an advantage of grid
                                                         →
                                                         −
approximation is that it saves computation complexity. 𝑓 is very computation consuming as 490,000
affine transformation calculations are needed, while 𝑓 # only does 49 such calculations. To lift up 𝑓
   →
   −                                                                                        →
                                                                                            −
to 𝑓 , we do affine transformation 𝑓 and the result point will be the position of entry of 𝑓 . Besides
squares, we can also have different shape of grids such as diamonds. Different shapes lead to different
approximations of summation of probabilities.


4. Experiment
In this section we give an example which takes MNIST hand-written digit classifier to illustrate analysis
of probabilistic abstract interpretation on convolutional neural network.

4.1. The Dataset
We      obtain     an     open-source      dataset    ‘MNIST        Digit     Dataset’     from    Kaggle
https://www.kaggle.com/competitions/digit-recognizer. The dataset contains about 40,000 handwritten
images of digits. Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in
total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that
pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.

4.2. The Model
We design our model to be an eight-layer convolutional neural network, comprising of two 28 × 28 × 32
convolutional layers with 5 × 5 filter followed by a 2 × 2 max-pooling layer, and then another two
14 × 14 × 64 convolutional layers with 3 × 3 filter followed by a 2 × 2 max-pooling layer, and finally
two dense layers, respectively. In total the network contains 887,530 parameters. We choose RMSprop
as our Optimizer and categorical cross entropy as our Loss Function. In final epoch of training, our
model achieved metrics 0.8917 and loss 0.3446.


                                                      11
Zhuofan Zhang et al. CEUR Workshop Proceedings                                                         1–14


4.3. Abstract Domain
An input image has 784 pixels and each pixel has 256 pixel values. So there are 256784 possible input
images and we cannot exhaustively run them. We take grids approximation as our abstract domain. For
each pixel, we classify its pixel-value into simply two categories, being value of 0 as dark and 255 as
bright respectively. we also combine a neighborhood of 7 × 7 pixels with the same pixel-value, that is,
we divide an image into 16 groups of pixels. A grid in this case is 16-dimensional, with each dimension
has two choices of values. Therefore in total we have 216 grids in this approximation. It shows that for
example the network will more likely recognize a bright top-left corner as a feature of digit ‘3’, and will
more likely recognize a bright bottom-right corner as a feature of digit ‘2’ or ‘9’.

4.4. Distribution Initialisation
We can sample density initialisation of inputs from training set. In this approximation we have 216
grids. We approximate each of training image into corresponding grid and calculate sample probability
of each grid with respect to the training set. The prediction of a high probability grid is more likely to
occur.

4.5. Evaluation
[28] gives an example of classical abstract interpretation analyzer to prove robustness against bright-
ening attack on MNIST digit classifier. The analyzer is able to prove whether a network does correct
classification under attack which perturbs an image by changing all pixels above a particular threshold
to the brightest possible value. In this case, probabilistic abstract interpretation aims for different
purpose. It (i)extracts features of the network and (ii)shows probability density flow of all possible
inputs towards prediction.


5. Conclusion and Future Work
We have discussed applying theory of probabilistic abstract interpretation on various kinds of neural
network layers. We can see probabilistic abstract interpretation can give information on density
distribution flow of possible inputs, which in particular cases is an advantage comparing to classical
abstract interpretation analysis framework that is sometimes sound but not precise enough.
During the discussion we can also see that problems remained in our approach. The examples we
illustrated were mostly in low dimension, but in neural networks it is very common to be in very high
dimensions. Secondly, we should discuss more kinds of abstract domains and abstract transformers,
in addition to grids approximation, in order to fit different application situations. We remain these as
future works.


References
 [1] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436.
 [2] Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1). Cambridge:
     MIT press.
 [3] Hailesilassie, T. (2016). Rule extraction algorithm for deep neural networks: A review. arXiv
     preprint arXiv:1610.05267.
 [4] Simonyan, K., Vedaldi, A., Zisserman, A., 2013. Deep inside convolutional networks: Visualising
     image classification models and saliency maps. CoRR abs/1312.6034.
 [5] Nguyen, A., Dosovitskiy, A., Yosinski, J., Brox, T., Clune, J., 2016. Synthesizing the preferred inputs
     for neurons in neural networks via deep generator networks. In: Advances in Neu- ral Information
     Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016,
     December 5-10, 2016, Barcelona, Spain. pp. 3387–3395.


                                                    12
Zhuofan Zhang et al. CEUR Workshop Proceedings                                                           1–14


 [6] Bazen, S., Joutard, X., 2013. The Taylor decomposition: A unified generalization of the Oaxaca
     method to nonlinear models. Working papers, HAL.
 [7] Landecker, W., Thomure, M. D., Bettencourt, L. M. A., Mitchell, M., Kenyon, G. T., Brumby, S.
     P., 2013. Interpreting individual classifications of hierarchical networks. In: IEEE Symposium on
     Computational Intelligence and Data Mining, CIDM 2013, Singapore, 16-19 April, 2013. pp. 32–38.
 [8] Bach, S., Binder, A., Montavon, G., Klauschen, F., Muller, K. R., Samek, W., 07 2015. On pixel-wise
     explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE
     10 (7), 1–46.
 [9] Samek, W., Wiegand, T., & Müller, K. R. (2017). Explainable Artificial Intelligence: Understanding,
     Visualization and Interpreting Deep Learning Models. arXiv:1708.08296.
[10] Zilke, J. R., Mencía, E. L., & Janssen, F. (2016, October). DeepRED–Rule Extraction from Deep
     Neural Networks. In International Conference on Discovery Science (pp. 457-473). Springer, Cham.
[11] Montavon, G., Samek, W., & Müller, K. R. (2017). Methods for interpreting and understanding deep
     neural networks. Digital Signal Processing.
[12] Jacobsson, H. (2005). Rule extraction from recurrent neural networks: Ataxonomy and review.
     Neural Computation, 17(6), 1223-1263.
[13] Wang, Q., Zhang, K., Ororbia, I. I., Alexander, G., Xing, X., Liu, X., & Giles, C. L. (2017). An empirical
     evaluation of recurrent neural network rule extraction. arXiv preprint arXiv:1709.10380.
[14] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, & Song-Chun Zhu (2017). Interpreting
     CNN Knowledge Via An Explanatory Graph. AAAI 2018.
[15] Quanshi Zhang, Ying Nian Wu, & Song-Chun Zhu (2017). Interpretable Convolutional Neural
     Networks. CVPR 2018.
[16] Bolei Zhou, David Bau, Aude Oliva, & Antonio Torralba. Interpreting Deep Visual Representations
     via Network Dissection. IEEE Transactions on Pattern Analysis and Machine Intelligence, June
     2018.
[17] Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional networks.
     In European conference on computer vision (pp. 818-833). Springer, Cham.
[18] Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Network dissection: Quantifying
     interpretability of deep visual representations. 30th IEEE Conference on Computer Vision and
     Pattern Recognition, CVPR 2017.
[19] Carlini, N., & Wagner, D. (2017, May). Towards evaluating the robustness of neural networks. In
     2017 ieee symposium on security and privacy (sp) (pp. 39-57). Ieee.
[20] Nielson, F., Nielson, H. R., & Hankin, C. (2015). Principles of program analysis. Springer.
[21] Cousot, P., & Cousot, R. (1977, January). Abstract interpretation: a unified lattice model for static
     analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th
     ACM SIGACT-SIGPLAN symposium on Principles of programming languages (pp. 238-252). ACM.
[22] Polignano, M., Musto, C., Pellungrini, R., Purificato, E., Semeraro, G., Setzu, M. (2024). XAI.it
     2024: An Overview on the Future of Explainable AI in the era of Large Language Models, in:
     Proceedings of 5th Italian Workshop on Explainable Artificial Intelligence, co-located with the
     23rd International Conference of the Italian Association for Artificial Intelligence, Bolzano, Italy,
     November 25-28, 2024, CEUR. org.
[23] Nielson, F., & Jones, N. (1994). Abstract interpretation: a semantics-based tool for program analysis.
     Handbook of logic in computer science, 4, 527-636.
[24] Cousot, P., & Monerau, M. (2012, March). Probabilistic abstract interpretation. In European Sym-
     posium on Programming (pp. 169-193). Springer, Berlin, Heidelberg.
[25] Di Pierro, A., & Wiklicky, H. (2000, September). Concurrent constraint programming: towards
     probabilistic abstract interpretation. In International Conference on Principles and Practice of
     Declarative Programming: Proceedings of the 2 nd ACM SIGPLAN international conference on
     Principles and practice of declarative programming (Vol. 20, No. 23, pp. 127-138).
[26] Di Pierro, A., Hankin, C., & Wiklicky, H. (2010). Probabilistic semantics and program analysis. In
     Formal Methods for Quantitative Aspects of Programming Languages (pp. 1-42). Springer, Berlin,
     Heidelberg.


                                                      13
Zhuofan Zhang et al. CEUR Workshop Proceedings                                                         1–14


[27] Di Pierro, A., & Wiklicky, H. (2016). Probabilistic abstract interpretation: From trace semantics to
     DTMC’s and linear regression. In Semantics, Logics, and Calculi (pp. 111-139). Springer, Cham.
[28] Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., & Vechev, M. (2018). AI 2:
     Safety and Robustness Certification of Neural Networks with Abstract Interpretation.
[29] Singh, G., Gehr, T., Püschel, M., & Vechev, M. (2019). An abstract domain for certifying neural
     networks. Proceedings of the ACM on Programming Languages, 3(POPL), 41.
[30] Katz, G., Barrett, C., Dill, D. L., Julian, K., & Kochenderfer, M. J. (2017). Reluplex: An efficient SMT
     solver for verifying deep neural networks. In Computer Aided Verification: 29th International
     Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I 30 (pp. 97-117).
     Springer International Publishing.
[31] Mirman, M., Gehr, T., & Vechev, M. (2018, July). Differentiable abstract interpretation for provably
     robust neural networks. In International Conference on Machine Learning (pp. 3578-3586). PMLR.
[32] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
[33] Wiklicky, H.: Probabilistic Programs [Lecture Notes]. Imperial College London, Program Analysis
     (COMP70020). https://www.doc.ic.ac.uk/ herbert/teaching/PA7S.pdf
[34] Tic-tac-toe with a neural network. Nested Software. (2019, December 27). Retrieved June 25, 2022,
     from https://nestedsoftware.com/2019/12/27/tic-tac-toe-with-a-neural-network-1fjn.206436.html
[35] Reisch, S. (1999). Gobang ist pspace-vollstandig (gomoku is pspace-complete). Acta Informatica,
     13, 5966.


                                                     14