Toward Geometrical Robustness with Hybrid Deep Learning and Differential
                              Invariants Theory
                                            Pierre-Yves Lagrave, Mathieu Riou
                                            Thales Research and Technology France,
                                                   1 avenue Augustin Fresnel
                                                     91767 Palaiseau cedex
                             pierre-yves.lagrave@thalesgroup.com, mathieu.riou@thalesgroup.com

                           Abstract                                    Convolutional Neural Networks (CNN) (LeCun et al.
                                                                    1998) have demonstrated the efficiency of embedding trans-
                                                                    lation symmetries into the design of a neural network for im-
                                                                    age processing taks. More generally, directly encoding the
Symmetries are ubiquitous in physics problems and these             required symmetries into the algorithm design decreases the
should be taken into account when neural networks are               number of parameters and increases the robustness. Using
used to approximate their solutions. Embedding symmetries           algorithms in which given properties are enforced through
within the neural networks by using equivariant layers has          their specification (correct-by-design algorithms) has also
been shown to be efficient from an accuracy standpoint.             the advantage of being more amenable to critical appli-
Building equivariant structures also appears appealing from         cations such as safety and military related tasks. In re-
the robustness standpoint since the use of correct-by-design        gards to the diversity of symmetries occurring in physics,
algorithms alleviates the verification step, which is a prereq-     a generic approach for embedding the symmetry groups into
uisite to any critical applications such as safety and military     the neural networks design is needed. Group-Convolutional
related tasks. However, generically enforcing equivariance          Neural Networks (G-CNN), firstly introduced by (Cohen
in neural networks requires the use of cumbersome oper-             and Welling 2016), have been recently extended to generic
ators such has group-based convolution kernels, for which           groups of symmetries (Finzi et al. 2020) and achieve this
the outputs may be hard to interpret. In this paper, we in-         purpose. However, they rely on a cumbersome specification
troduce EqPdeNet, an alternative method in which equiv-             and are hard to interpret.
ariant partial differential equations are embedded within the          In this paper, we introduce the EqPdeNet neural net-
first layer of a neural network. This approach provides ap-         work by leveraging on the differential invariants of Lie
proximate equivariance with respect to any Lie group action         group actions. Within EqPdeNet, equivariant inner repre-
and allows combining several types of equivariance within           sentations of the input data are built through the first layer
the same network. Moreover, the structure of the associated         of the neural network and are then processed by deeper fully
partial differential equations can be directly related to the       connected layers, following the hierarchical structure of the
physical nature of the input data, making this approach par-        usual CNN. This hybrid approach applies to any Lie group
ticularly appealing from an interpretability standpoint when        without requiring the group action to act transitively on the
compared to the use of group-based convolution kernels.             input manifold and increases the accuracy and the robust-
                                                                    ness by achieving approximate equivariance. In addition,
                       Introduction                                 the structure of the associated Partial Differential Equations
                                                                    (PDE) can be directly related to the physical nature of the in-
Symmetries are ubiquitous in physics with finite groups of          put data, making this approach particularly appealing from
symmetries such as the hexagonal lattice of the graphene and        an interpretability standpoint when compared to the use of
continuous groups such as Lorentz group in particle physics.        group-based convolution kernels.
   Highlighted by their successes in image and speech recog-
nition (Szegedy et al. 2017), (Xiong et al. 2016), neu-                      Related Work and Contribution
ral networks are now used in various physics fields such
as fluid mechanics (Raissi, Perdikaris, and Karniadakis             We review in the following the related work by first focusing
2019), (Raissi, Yazdani, and Karniadakis 2020), high en-            on G-CNN and then highlight the existing duality between
ergy physics (Baldi, Sadowski, and Whiteson 2014) or                neural networks and PDE.
condensed matter physics (Carrasquilla and Melko 2017),
                                                                    Group-Convolutional Neural Networks
(Van Nieuwenburg, Liu, and Huber 2017).
                                                                    The success of CNN for image processing task has moti-
Copyright c 2021 for this paper by its authors. Use permitted un-   vated several works with respect to the generalization of
der Creative Commons License Attribution 4.0 International (CC      their translation equivariant layers to other type of trans-
BY 4.0)                                                             forms. In this context, (Cohen and Welling 2016) introduced
the concept of Group-Convolutional Neural Network (G-             Also, thanks to a convolution-based integration of the PDE
CNN) by extending the principle of weights sharing to sat-        layer, an end-to-end training can be performed within some
isfy other symmetries than translations and focuses on dis-       automatic differentiation framework such as TensorFlow or
crete groups such as p4 and p4m. Other works were devel-          PyTorch.
oped focusing on specific groups of symmetry such as the
permutations group (Zaheer et al. 2017), some discrete sub-       Contribution
groups of the 2-dimensional rotation group SO (2) (Mar-           The main contributions of this paper are the following:
cos, Volpi, and Tuia 2016), SO (2) itself (Oyallon and Mal-
lat 2015), (Worrall et al. 2017), (Weiler, Hamprecht, and         • We introduce the EqPdeNet hybrid architecture featur-
Storath 2018), the 3-dimensional translation and rotation           ing a first equivariant PDE layer by leveraging on the
groups (Cohen et al. 2018), (Esteves et al. 2018).                  differential invariants of Lie group actions, followed by
                                                                    usual fully connected layers. Our approach in particular
   These approaches were later generalized to more general
                                                                    allows considering several types of equivariance within
sets of transforms and in particular, to those arising from a
                                                                    the first layer PDE layer.
transitive action of a Lie group (Gens and Domingos 2014),
(Huang et al. 2017), (Bekkers 2019). Recently, a generic ap-      • We give some numerical evidence to support the interest
proach was proposed (Finzi et al. 2020) without requiring           of our approach from both performance and robustness
the group action to be transitive.                                  standpoints by performing some comparisons with the be-
   All these works aim at generalizing the usual CNN struc-         havior of some usual neural networks on the ROTMNIST
ture by building equivariant layers to make the overall net-        dataset (Larochelle et al. 2007).
work equivariant. Our approach rather aims at achieving ap-       • We provide a numerical integration scheme for arbitrary
proximate equivariance through the use of one PDE layer             PDE by using some discrete convolution operators, mak-
and does not use group-based convolution kernels.                   ing the EqPdeNet approach compatible with an end-
                                                                    to-end training through back-propagation within an auto-
PDE-Based Neural Networks                                           matic differentiation framework.
Motivated by the universal approximation theorem (Hornik
et al. 1989), neural networks have been used to approximate                        Invariance and PDE
the solutions of PDE. A major work in this area is the in-        By leveraging on the formalism introduced in (Olver 1993),
troduction of the Physics Informed Neural Network (PINN)          we give in the following some general background about in-
approach (Raissi, Perdikaris, and Karniadakis 2019) as an         variance theory for PDE. This will allow us to introduce the
alternative to the usual finite difference methods.               notion of differential invariants of a Lie group action, which
   (Chen et al. 2018) emphasizes that a residual neural net-      is central to our work, and to explain how to build equivari-
work can actually be seen as some discretization of an un-        ant representations of input data by solving a specific type
known Ordinary Differential Equation (ODE) and they show          of PDE.
how to efficiently learn the ODE parameters from the data by
using adjoint techniques and classical ODE solvers. Build-        Symmetry Group
ing on similar ideas, (Ruthotto and Haber 2019) uses the
ODE formulation of the neural network to introduce induc-         Formally, we will see a PDE of order n in p independent
tive bias, such as parabolic or hyperbolic properties to en-      variables x = (x1 , ..., xp ) ∈ X and one dependent variable
force respectively robustness to perturbations and low mem-       u = u (x1 , ..., xp ) ∈ U as an equation involving x, u and
ory usage. The use of differential equation formulation to        uα = ∂α u, for α ∈ Nk , k ≥ 0 and |α| ≤ n. A PDE solution
embed desired properties into the neural network is a com-        will be of the form u = f (x).
mon feature with our work. However the question of sym-              In the following, we denote by X = Rp , with coordinates
metry is not considered in this work.                             (x1 , ..., xp ) , the space of the independent variables and by
   The use of PDE has also appeared useful for build-             U = R, with coordinates u, that of the dependent variable.
ing equivariant structures. (Shen et al. 2020) introduces an      Let’s then consider a Lie group G of dimension m acting as
equivariant kernel to the isometry group SE(2) from differ-       g. (x, u) on a sub-manifold M ⊆X × U, with its Lie algebra
ential operators approximated through usual kernel convo-         g generated by the vector fields ζ1 , ..., ζm . For instance, G
lutions. In (Smets et al. 2020), a neural network equivariant     could be the 2-dimensional rotation group SO(2) acting on
to a generic transitive Lie group action is proposed by using     X × U ' R2 with the infinitesimal generator ζ1 = −u∂x +
several layers of equivariant PDE, the training of the algo-      x∂u .
rithm consisting in finding the parameters of the PDE. Less          We can define the transform of a function u = f (x) un-
recently, but closer to the present work, (Fang et al. 2017)      der the action of G by identifying f with its graph Γf =
have used a single PDE to extract equivariant features for a      {(x, f (x)) , x ∈ Ω ⊆ X } ⊆ X × U and by defining g.f =
linear classifier by leveraging on differential invariants the-   fg , where the function fg is the function associated with the
ory.                                                              transformed graph g.Γf defined as it follows for g ∈ G:
   The hybrid approach we are proposing is applicable to                g.Γf = {g. (x, f (x)) , (x, f (x)) ∈ Γf } = Γfg      (1)
any Lie group action provided that a generating set of dif-
ferential invariants can be efficiently computed and it allows    These notions of transformed function and group action on
for several group actions to be considered simultaneously.        functional graphs are illustrated on Figure 1 where the graph
                                                                  Examples We have chosen to work with image classifica-
                                                                  tion to illustrate our approach and we have considered the
                                                                  action of the 2 dimensional special euclidean group SE(2)
                                                                  and that of the group ΛR∗+ (2) of the scaled translations on
                                                                  X ⊆ R2 (p = 2), which can be seen as actions on X × U by
                                                                  considering a trivial component for the U part.
                                                                     Using the infinitesimal invariance criteria allows writing
                                                                  corresponding sets of differential invariants as it follows:
                                                                                                     u,
                                                                                                                       
Figure 1: Action of an element g in the 2-dimensional rota-
                                                                                                                       
                                                                                                 u2x + u2y ,
                                                                                    
                                                                                                                       
                                                                                                                        
                                                                                                                       
tion group SO(2) on the graph Γf of a function f : R → R                 SE(2)
                                                                      ∂φu,2 =                   uxx + uyy ,                    (4)
                                                                                        2                       2
                                                                                    
                                                                                    
                                                                                      ux uxx + 2u x uy uxy  + uy uyy , 
                                                                                                                        
                                                                                                                        
                                                                                            u2xx + 2u2xy + u2yy
                                                                                                                       
of a function f : R → R (left) is transformed according to
the action of a group element g ∈ SO(2) by simple rotation                          (                                    )
(right).                                                                    ΛR∗         u2x u2x u2x u2y u2y u2y
   With this formalism, a symmetry group G of the consid-                ∂φu,2+ =          ,   ,   ,   ,   ,                   (5)
                                                                                        uxx uxy uyy uxx uxy uyy
ered PDE is a group G acting of M ⊆ X × U in such a way
that if f is a solution, then its transformed fg by the group
action is also a solution.
                                                                              Equivariant Representations
                                                                  A map ψ : A → B is said to be equivariant with respect
Differential Invariants We call n−order jet space J (n)           to the action of a group G if ψ (g.a) = g.ψ (a), ∀a ∈ A
the cartesian product between the space of the independent        and ∀g ∈ G. Leveraging on the differential invariants theory
variables X and enough copies of the space of the dependent       previously introduced, we build from d ∈ I representations
variable U to include coordinates for each partial derivative     which are equivariant to the action of a given group G, where
of order less or equal than n:                                    I refers to the input space.
                                                                     To do so, we consider that a data point d ∈ I can be
                  J (n) = X × U × .... × U                 (2)    represented by the graph of a function fd from X to U, so
                              |    {z    }
                                 (p+n                             that d = {(x, fd (x)) , x ∈ X } . With this formalism, a gray
                                    n )
                                                                  scale image such as one of the ROTMNIST samples consid-
In the above definition, the binomial coefficient p+n             ered in our numerical experiments can be represented by the
                                                        
                                                     n    cor-
responds to the number of partial derivatives of the function     graph of the function associating each position to its pixel
f (assumed to be smooth enough) with order less or equal          value.
than n. A function f : X → U represented as u = f (x) can            Following similar ideas to (Fang et al. 2017) and (Smets
naturally be prolonged to a function u(n) = f (n) (x) from X      et al. 2020), we model the representation learning process
                                                                  by the following PDE:
to J (n) by evaluating f and the corresponding partial deriva-
tive, so that u(n) = {uα , |α| ≤ n}.
                                                                                                             
                                                                                        ∂t u = F ∂φG
                                                                                     
                                                                                                         u,n
   According to the considered formalism, a generic PDE                                                                     (6)
                                                                                        ut=0 = fd
could therefore be written as it follows,
                                                                F is a function from
                                                                                      the set of the differential invariants to
                       ∆ x, u(n) = 0                       (3)    R and F ∂φG    u,n is therefore also a differential invariant,
                                                                  any function of the differential invariants being a differential
where ∆ is an operator from the n−order jet space J (n)           invariant itself.
to R. We then denote by pr(n) G the prolongation of the              It therefore means that for g ∈ G, g.uT will also be a solu-
group action of G to J (n) for which a prolonged transform        tion so that the learned representation of the data is actually
g (n) , for g ∈ G, sends the graph Γf (n) onto Γ(g.f )(n) , and   equivariant with respect to the action of G in the sense that
                                                                  g.uT (fd ) = uT (g.fd ), where uT (f0 ) corresponds to the
by pr(n) ζ1 , ..., pr(n) ζm the corresponding prolonged vector    solution of (6) with initial condition f0 . Hence, as illustrated
fields.                                                           in Figure 2 in the case of SE(2), diffusing the PDE (6) al-
   In the following, we will be interested in operators ∆ as-     lows for extracting similar representations (features maps on
sociated with the PDE having G as a symmetry group. These         the right) from the inputs fd (upper left) and g.fd (lower
operators are called the differential invariants of the action    left).
of G and are the algebraic invariants of the prolonged group         Different functions F of the differential invariants lead
action pr(n) G, for n ≥ 0. They can be obtained by leverag-       to different equivariant representations by diffusing the cor-
ing on the infinitesimal invariance criteria pr(n) ζi ∆ = 0 for   responding PDE. As equivariance only is not enough for a
i = 1, ..., m (Olver 2016) and (Hubert 2009).                     representation to be discriminative (e.g., black areas in the
   A set of differential invariants of order n will be generi-    corner of the MNIST samples), we will then use a learn-
cally denoted by ∂φG    u,n in the sequel.                        ing approach to identify the representations conveying some
                                                                    Figure 3: EqPdeNet structure combining a first equivari-
                                                                    ant PDE layer, a dimension reduction layer and deeper fully
                                                                    connected layers


Figure 2: Extraction of an SE(2)- equivariant representa-                                                                         !
                                                                                                            nt
tion of an MNIST sample I0 with the heat equation ∂t u =                                                    Y
uxx + uyy , ut=0 = I0 . The equivariant property makes the                                min           L         {ỹdi , ydi }           (8)
                                                                                     θ1 ,...,θne ,ω,δ
associated diagram commutative, for g ∈ SE(2).                                                              i=1
                                                                                             t  n
                                                                      where the (fdi , ydi )i=1 refers to the training samples.
meaningful information about the input data through the in-         PDE Integration
ference of the function F .
   More precisely, we will assume that F belongs to a para-         In order to efficiently find some numerical approximations to
metric space, so that F = Fθ , for θ ∈ Θ ⊆ Rk . In the fol-         the equivariant PDE and to train the entire architecture from
lowing, F will be chosen to be linear as in (Fang et al. 2017)      end-to-end, we propose an integration method compatible
or more generally, as a multivariate polynomial in the dif-         with the backpropagation technique within some automatic
ferential invariants. The corresponding vectorial parameter         differentiation frameworks such as TensorFlow or PyTorch.
θ will be part of the trainables parameters of our approach.
                                                                    Convolution Approach Following (Ruthotto and Haber
In the following, we will denote uθT the representation ex-
                                                                    2019) and (Long et al. 2018), our approach consists in ap-
tracted by solving the PDE (6) with F = Fθ . The explicit
                                                                    proximating the PDE integration operator with some usual
reference to the initial condition is made by writing uθT (fd )
                                                                    convolution layers built from well chosen kernels. More pre-
when needed but it will be generally dropped to ease the ex-
                                                                    cisely, we consider the explicit Euler discretization of (6),
position.
                                                                    which we write as it follows:
                 An Hybrid Approach                                                                                                   
                                                                                               = u` + ∆t × Fθ ∂φG
                                                                                 
                                                                                     u`+1                       u,n
We introduce in this section a generic hybrid approach com-                                                                               (9)
bining the previously introduced PDE based equivariant rep-                          u0        = fd
resentations learning with some fully connected feed for-              where u` = u`∆t , ∆t > 0 is a discretization       parame-
ward layers, following the intuition behind of the hierarchi-                                               T
                                                                                                               
                                                                    ter and 0 ≤ ` ≤ `T , with `T = ∆t           . We then consider
cal structure of usual CNN.                                         that each iteration of the above Euler scheme corresponds
EqPdeNet Structure                                                                        network with input u` and outputs
                                                                    to a layer of a neural
                                                                    u` + ∆t × Fθ ∂φG   u . Our approach called EulerConv (Fig-
We introduce the EqPdeNet structure depicted with 2-                ure 4) then consists in implementing this layer by approxi-
dimensional data on Figure 3, in which ne PDE are used              mating the differential operator ∂α required for building the
                                                      θ
to extract the equivariant representations uθT1 ,...,uTne . A di-   differential invariants ∂φGu with some appropriate convolu-
mension reduction layer (e.g., pooling, linear combination,         tion filters.
etc.) is then combined with deeper fully connected layers to           For each differentiation index α, it is therefore possible to
produce the outputs. An output ỹd ∈ Y of EqPdeNet cor-             write
responding to the input fd is then computed according to the                               ∂α U` = Kα ? U`                     (10)
following formula:
                                                                   where U` is a tensor referring to a discretization of u` over
                                         θ                          the domain X , Kα is a constant convolution kernel and ? is
           ỹd = Nω φδ uθT1 (fd ) , .., uTne (fd )            (7)
                                                                    the discrete convolution operator (Differential convolution
where Nω refers to the prediction function of the fully con-        layer on Figure 4). The differential invariants can then be
nected layers with weights ω and δ to the parameters of the         obtained from the values ∂α U` which correspond to the ap-
                                                         n
dimension reduction layer. Denoting by L : (Y × Y) t →              proximate values of the differential ∂α u` , computed by finite
R, for nt ≥ 1, a relevant loss function for the considered          differences through the convolution kernel Kα (Differential
learning task, the training of the algorithm therefore consists     invariants layer on Figure 4). The corresponding output (Up-
in finding an approximate solution to the following mini-           date layer on Figure 4) is the result of one step of the Euler
mization problem,                                                   scheme (9). The unit allowing to perform the entire Euler
                                                                                          EqPdeNet
                                                                          #param        Test      iso            sca
                                                                           13033      70.7(2.7) 37.3(1.6)      23.1(1.3)
                                                                           26537      77.9(0.6) 39.6(1.3)      24.2(1.1)
                                                                           55081      82.7(0.7) 43.8(0.9)      25.8(1.4)
                                                                          118313      86.8(0.6) 48.8(1.1)      28.2(1.0)
                                                                          269353      89.9(0.8) 53.0(0.9)      30.2(1.1)
Figure 4: EulerConv unit allowing to perform one step in
the explicit Euler scheme by approximating the differential       Table 1: Accuracy of the EqPdeNet network in several
operators ∂α with appropriate convolution kernels                 scenarios after training on the original ROTMNIST training
                                                                  samples


                                                                  cal system with symmetries (Noether’s theorem) as long as
                                                                  a generating set of differential invariants can be efficiently
                                                                  computed for the corresponding symmetry group.
                                                                     Following the line of existing work with respect to the
Figure 5: ConvInt unit allowing to integrate the PDE              testing of equivariant algorithms, we have structured our
using an explicit discretisation scheme using several             numerical from the ROTMNIST dataset and we emphasize
EulerConvunits                                                    here that we did not use any kind of data augmentation tech-
                                                                  nique for the training step. More precisely, the ROTMNIST
                                                                  dataset was built in (Larochelle et al. 2007) from the orignial
scheme is referred to as ConvInt (Figure 5) and is a con-         MNIST digits by applying to the original samples random
catenation of EulerConv layers for the appropriate number         rotations with angles sampled uniformly in [0, 2π]. In the
of time steps.                                                    following, algorithms have been trained on the 12k training
About Numerical Accuracy The above convolution ap-                samples and all results have been obtained with a Tensor-
proach to the PDE integration can actually be seen as a           Flow based implementation of our approach running on a
specific explicit finite difference scheme, therefore raising     GeForce RTX 2080 Nvidia card.
some natural questions about consistency, stability and con-         We have used a EqPdeNet network with a PDE layer
vergence. Even for simple choices of the function Fθ , the        aiming to build equivariant data representations with respect
theoretical analysis of this scheme is not an easy task to per-   to the translation group, and either the rotation or the scal-
form due to the strong non-linearity introduced by the dif-       ing group. More precisely, the PDE layer includes two PDE
                                                                                                            SE(2)
ferential invariants and we therefore defer it to some further    built from the differential invariants ∂φu,2 and two others
work.                                                                                     ΛR∗
   It is however possible to comment on some practical tools      combining those of ∂φu,2+ , whose outputs are then linearly
that can be used to control the numerical accuracy of the dis-    combined.
cretization scheme. Considering the time dimension only, it          To illustrate the benefits of our approach when compared
                           ∆t→0                                   to corresponding fully connected neural networks (FCNN)
holds that ku`∆t − ut k = o (∆t) so that we can make              from both accuracy and robustness standpoints, we have
the time discretization error arbitrarily small by decreasing     built several scenarios from the original ROTMNIST testing
the parameter ∆t and adding some more EuleurConv units            set, namely
accordingly in the ConvInt units. With respect to the space
dimension, the discretization error can be controlled by in-      • iso: a random isometry, i.e. a combination of a random
creasing the number of sampling points when building U`             translation of (th , tv ) pixels and a random rotation of θ
from u`∆t .                                                         degrees, where th ∼ U (−2, 2), tv ∼ U (−2, 2), and
   In some practical situations such as image processing            θ ∼ U (−30, 30), is applied to each of the original testing
tasks, the input data lies in a discrete manifold so that the       samples.
smooth functional representation that was previously intro-       • sca: a random scaling transform x, y → (λx, λy) with
duced does not directly apply. In this case, interpolation          parameter λ ∼ U 32 , 1 is applied to each of the original
methods such as the functional convolution can be used to           testing samples.
obtain some continuous inputs (Simard et al. 1998).               where U (a, b) refers to the uniform distribution on the
                                                                  interval[a, b].
               Numerical Experiments                                 The accuracy results in each scenario obtained after av-
We provide in this section the results of numerical experi-       eraging over 10 instances of testing to smooth out the sta-
ments we have conducted on the 2-dimensional problem of           tistical noise are given in tables 1 and 2, together with the
image classification. However, as generically applicable to       corresponding standard deviation as subscripts.
symmetric learning tasks on smooth functional data, our ap-          We see that the accuracy on the testing set is consis-
proach is not specific to image classification and could for      tently higher with our approach than with the corresponding
instance be instantiated to predict the evolution of a physi-     FCNN, for all the considered numbers of parameters. With
                           FCNN                                    ral network instead of a multivariate polynomial for the pa-
        #param        Test      iso            sca                 rameterization of the differential invariants through the func-
         13002      65.5(1.5) 36.7(1.4)      23.5(0.6)             tion Fθ may help improving the expressiveness of our ap-
         26506      73.4(0.9) 37.9(1.2)      23.4(0.7)             proach.
         55050      80.8(0.5) 42.4(1.0)      25.0(1.2)
        118282      85.7(0.4) 47.0(0.8)      26.9(0.9)                                     References
        269322      87.7(0.5) 49.5(0.6)      28.3(0.8)             Baldi, P.; Sadowski, P.; and Whiteson, D. 2014. Searching
                                                                   for exotic particles in high-energy physics with deep learn-
Table 2: Accuracy of FCNN in several scenarios after train-        ing. Nature communications 5(1): 1–9.
ing on the original ROTMNIST training samples                      Bekkers, E. J. 2019. B-spline cnns on lie groups. arXiv
                                                                   preprint arXiv:1909.12057 .
respect to robustness, higher accuracies are reached with our      Carrasquilla, J.; and Melko, R. G. 2017. Machine learning
approach in the iso and sca testing scenarios as the number        phases of matter. Nature Physics 13(5): 431–434.
of parameters increases, consistently with the increase of the     Chen, R. T.; Rubanova, Y.; Bettencourt, J.; and Duvenaud,
overfitting risk.                                                  D. K. 2018. Neural ordinary differential equations. In
   Hence, although less performant than using G-CNN                Advances in neural information processing systems, 6571–
which are able to achieve a testing accuracy of almost             6583.
99% with SE(2) equivariance (Finzi et al. 2020) because            Cohen, T.; and Welling, M. 2016. Group equivariant convo-
of a simpler structure and approximate equivariance, our           lutional networks. In International conference on machine
EqPdeNet approach does provide material improvements               learning, 2990–2999.
with respect to the usual FCNN, from both accuracy and ro-
                                                                   Cohen, T. S.; Geiger, M.; Köhler, J.; and Welling, M. 2018.
bustness standpoints.
                                                                   Spherical cnns. arXiv preprint arXiv:1801.10130 .
                                                                   Esteves, C.; Allen-Blanchette, C.; Makadia, A.; and Dani-
          Conclusions and Further Work                             ilidis, K. 2018. Learning so (3) equivariant representations
In this paper we proposed an hybrid architecture with a first      with spherical cnns. In Proceedings of the European Con-
PDE based layer made equivariant to generic group actions          ference on Computer Vision (ECCV), 52–68.
by leveraging on differential invariants theory. This struc-       Fang, C.; Zhao, Z.; Zhou, P.; and Lin, Z. 2017. Feature learn-
ture allows achieving simultaneous approximate equivari-           ing via partial differential equation with applications to face
ance with respect to several group actions by aggregating the      recognition. Pattern Recognition 69: 14–25.
learned inner representations through a dimension reduction
layer feeding deeper fully connected layers.                       Finzi, M.; Stanton, S.; Izmailov, P.; and Wilson, A. G. 2020.
   In order to make the approach practical, we have specified      Generalizing convolutional neural networks for equivariance
an end-to-end training method compatible with the usual au-        to lie groups on arbitrary continuous data. arXiv preprint
tomatic differentiation frameworks in which the numerical          arXiv:2002.12880 .
approximations to the several PDE solutions are obtained           Gens, R.; and Domingos, P. M. 2014. Deep symmetry net-
through the use of fixed weights convolution operators.            works. In Advances in neural information processing sys-
   We have performed some numerical testing on the                 tems, 2537–2545.
ROTMNIST dataset and have shown the superiority of our             Hornik, K.; Stinchcombe, M.; White, H.; et al. 1989. Mul-
approach from both accuracy and robustness standpoints,            tilayer feedforward networks are universal approximators.
when compared to fully connected neural networks. Our re-          Neural networks 2(5): 359–366.
sults are however below those reported for G-CNN as our
                                                                   Huang, Z.; Wan, C.; Probst, T.; and Van Gool, L. 2017. Deep
approach is simpler and does not ensure strict equivariance.
                                                                   learning on lie groups for skeleton-based action recognition.
However, the PDE built from the differential invariants are
                                                                   In Proceedings of the IEEE conference on computer vision
easier to interpret than group-based convolution kernels.
                                                                   and pattern recognition, 6099–6108.
   Although we believe the approach and our preliminary
numerical results to be promising, additional work is needed       Hubert, E. 2009. Differential invariants of a Lie group ac-
for deriving rigorous rules with respect to hyperparameters        tion: Syzygies on a generating set. Journal of Symbolic
setting. In particular, a theoretical analysis of the conver-      Computation 44(4): 382 – 416. ISSN 0747-7171. doi:
gence of the discretization units for several Lie groups and       https://doi.org/10.1016/j.jsc.2008.08.003. URL http://www.
PDE types would be valuable.                                       sciencedirect.com/science/article/pii/S0747717108001089.
   Finally, by leveraging on the interpretability feature of our   Larochelle, H.; Erhan, D.; Courville, A.; Bergstra, J.; and
approach, we plan to conduct analysis of the learned equiv-        Bengio, Y. 2007.        An Empirical Evaluation of Deep
ariant representations as to refine the choice of the paramet-     Architectures on Problems with Many Factors of Varia-
ric form of the PDE to be considered, to study the oppor-          tion. In Proceedings of the 24th International Confer-
tunity to use partial specification techniques and to discuss      ence on Machine Learning, ICML ’07, 473–480. New York,
some safety and certification aspects. Also, as done in (Finzi     NY, USA: Association for Computing Machinery. ISBN
et al. 2020) to model convolution kernels, using a small neu-      9781595937933. doi:10.1145/1273496.1273556.
LeCun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998.         Worrall, D. E.; Garbin, S. J.; Turmukhambetov, D.; and
Gradient-based learning applied to document recognition.         Brostow, G. J. 2017. Harmonic networks: Deep translation
Proceedings of the IEEE 86(11): 2278–2324.                       and rotation equivariance. In Proceedings of the IEEE Con-
Long, Z.; Lu, Y.; Ma, X.; and Dong, B. 2018. PDE-Net:            ference on Computer Vision and Pattern Recognition, 5028–
Learning PDEs from Data. volume 80 of Proceedings of Ma-         5037.
chine Learning Research, 3208–3216. Stockholmsmässan,           Xiong, W.; Droppo, J.; Huang, X.; Seide, F.; Seltzer, M.;
Stockholm Sweden: PMLR. URL http://proceedings.mlr.              Stolcke, A.; Yu, D.; and Zweig, G. 2016. Achieving human
press/v80/long18a.html.                                          parity in conversational speech recognition. arXiv preprint
Marcos, D.; Volpi, M.; and Tuia, D. 2016. Learning rotation      arXiv:1610.05256 .
invariant convolutional filters for texture classification. In   Zaheer, M.; Kottur, S.; Ravanbakhsh, S.; Poczos, B.;
2016 23rd International Conference on Pattern Recognition        Salakhutdinov, R. R.; and Smola, A. J. 2017. Deep sets. In
(ICPR), 2012–2017. IEEE.                                         Advances in neural information processing systems, 3391–
Olver, P. 1993. Applications of Lie Groups to Differential       3401.
Equations. New York, NY, USA: Springer-Verlag.
Olver, P. 2016. Equivariant Moving Frames for Euclidean
Surfaces.
Oyallon, E.; and Mallat, S. 2015. Deep roto-translation scat-
tering for object classification. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition,
2865–2873.
Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2019.
Physics-informed neural networks: A deep learning frame-
work for solving forward and inverse problems involving
nonlinear partial differential equations. Journal of Compu-
tational Physics 378: 686–707.
Raissi, M.; Yazdani, A.; and Karniadakis, G. E. 2020. Hid-
den fluid mechanics: Learning velocity and pressure fields
from flow visualizations. Science 367(6481): 1026–1030.
Ruthotto, L.; and Haber, E. 2019. Deep Neural Networks
Motivated by Partial Differential Equations. Journal of
Mathematical Imaging and Vision 62: 352–364.
Shen, Z.; He, L.; Lin, Z.; and Ma, J. 2020. PDO-eConvs:
Partial Differential Operator Based Equivariant Convolu-
tions. arXiv preprint arXiv:2007.10408 .
Simard, P.; LeCun, Y.; Denker, J. S.; and Victorri, B.
1998. Transformation Invariance in Pattern Recognition-
Tangent Distance and Tangent Propagation. In Neural Net-
works: Tricks of the Trade, This Book is an Outgrowth of a
1996 NIPS Workshop, 239–27. Berlin, Heidelberg: Springer-
Verlag. ISBN 3540653112.
Smets, B.; Portegies, J.; Bekkers, E.; and Duits, R. 2020.
PDE-based Group Equivariant Convolutional Neural Net-
works. arXiv preprint arXiv:2001.09046 .
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; and Alemi, A. A.
2017. Inception-v4, inception-resnet and the impact of resid-
ual connections on learning. In Thirty-first AAAI conference
on artificial intelligence.
Van Nieuwenburg, E. P.; Liu, Y.-H.; and Huber, S. D. 2017.
Learning phase transitions by confusion. Nature Physics
13(5): 435–439.
Weiler, M.; Hamprecht, F. A.; and Storath, M. 2018. Learn-
ing steerable filters for rotation equivariant CNNs. In Pro-
ceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 849–858.