Toward Geometrical Robustness with Hybrid Deep Learning and Differential Invariants Theory Pierre-Yves Lagrave, Mathieu Riou Thales Research and Technology France, 1 avenue Augustin Fresnel 91767 Palaiseau cedex pierre-yves.lagrave@thalesgroup.com, mathieu.riou@thalesgroup.com Abstract Convolutional Neural Networks (CNN) (LeCun et al. 1998) have demonstrated the efficiency of embedding trans- lation symmetries into the design of a neural network for im- age processing taks. More generally, directly encoding the Symmetries are ubiquitous in physics problems and these required symmetries into the algorithm design decreases the should be taken into account when neural networks are number of parameters and increases the robustness. Using used to approximate their solutions. Embedding symmetries algorithms in which given properties are enforced through within the neural networks by using equivariant layers has their specification (correct-by-design algorithms) has also been shown to be efficient from an accuracy standpoint. the advantage of being more amenable to critical appli- Building equivariant structures also appears appealing from cations such as safety and military related tasks. In re- the robustness standpoint since the use of correct-by-design gards to the diversity of symmetries occurring in physics, algorithms alleviates the verification step, which is a prereq- a generic approach for embedding the symmetry groups into uisite to any critical applications such as safety and military the neural networks design is needed. Group-Convolutional related tasks. However, generically enforcing equivariance Neural Networks (G-CNN), firstly introduced by (Cohen in neural networks requires the use of cumbersome oper- and Welling 2016), have been recently extended to generic ators such has group-based convolution kernels, for which groups of symmetries (Finzi et al. 2020) and achieve this the outputs may be hard to interpret. In this paper, we in- purpose. However, they rely on a cumbersome specification troduce EqPdeNet, an alternative method in which equiv- and are hard to interpret. ariant partial differential equations are embedded within the In this paper, we introduce the EqPdeNet neural net- first layer of a neural network. This approach provides ap- work by leveraging on the differential invariants of Lie proximate equivariance with respect to any Lie group action group actions. Within EqPdeNet, equivariant inner repre- and allows combining several types of equivariance within sentations of the input data are built through the first layer the same network. Moreover, the structure of the associated of the neural network and are then processed by deeper fully partial differential equations can be directly related to the connected layers, following the hierarchical structure of the physical nature of the input data, making this approach par- usual CNN. This hybrid approach applies to any Lie group ticularly appealing from an interpretability standpoint when without requiring the group action to act transitively on the compared to the use of group-based convolution kernels. input manifold and increases the accuracy and the robust- ness by achieving approximate equivariance. In addition, Introduction the structure of the associated Partial Differential Equations (PDE) can be directly related to the physical nature of the in- Symmetries are ubiquitous in physics with finite groups of put data, making this approach particularly appealing from symmetries such as the hexagonal lattice of the graphene and an interpretability standpoint when compared to the use of continuous groups such as Lorentz group in particle physics. group-based convolution kernels. Highlighted by their successes in image and speech recog- nition (Szegedy et al. 2017), (Xiong et al. 2016), neu- Related Work and Contribution ral networks are now used in various physics fields such as fluid mechanics (Raissi, Perdikaris, and Karniadakis We review in the following the related work by first focusing 2019), (Raissi, Yazdani, and Karniadakis 2020), high en- on G-CNN and then highlight the existing duality between ergy physics (Baldi, Sadowski, and Whiteson 2014) or neural networks and PDE. condensed matter physics (Carrasquilla and Melko 2017), Group-Convolutional Neural Networks (Van Nieuwenburg, Liu, and Huber 2017). The success of CNN for image processing task has moti- Copyright c 2021 for this paper by its authors. Use permitted un- vated several works with respect to the generalization of der Creative Commons License Attribution 4.0 International (CC their translation equivariant layers to other type of trans- BY 4.0) forms. In this context, (Cohen and Welling 2016) introduced the concept of Group-Convolutional Neural Network (G- Also, thanks to a convolution-based integration of the PDE CNN) by extending the principle of weights sharing to sat- layer, an end-to-end training can be performed within some isfy other symmetries than translations and focuses on dis- automatic differentiation framework such as TensorFlow or crete groups such as p4 and p4m. Other works were devel- PyTorch. oped focusing on specific groups of symmetry such as the permutations group (Zaheer et al. 2017), some discrete sub- Contribution groups of the 2-dimensional rotation group SO (2) (Mar- The main contributions of this paper are the following: cos, Volpi, and Tuia 2016), SO (2) itself (Oyallon and Mal- lat 2015), (Worrall et al. 2017), (Weiler, Hamprecht, and • We introduce the EqPdeNet hybrid architecture featur- Storath 2018), the 3-dimensional translation and rotation ing a first equivariant PDE layer by leveraging on the groups (Cohen et al. 2018), (Esteves et al. 2018). differential invariants of Lie group actions, followed by usual fully connected layers. Our approach in particular These approaches were later generalized to more general allows considering several types of equivariance within sets of transforms and in particular, to those arising from a the first layer PDE layer. transitive action of a Lie group (Gens and Domingos 2014), (Huang et al. 2017), (Bekkers 2019). Recently, a generic ap- • We give some numerical evidence to support the interest proach was proposed (Finzi et al. 2020) without requiring of our approach from both performance and robustness the group action to be transitive. standpoints by performing some comparisons with the be- All these works aim at generalizing the usual CNN struc- havior of some usual neural networks on the ROTMNIST ture by building equivariant layers to make the overall net- dataset (Larochelle et al. 2007). work equivariant. Our approach rather aims at achieving ap- • We provide a numerical integration scheme for arbitrary proximate equivariance through the use of one PDE layer PDE by using some discrete convolution operators, mak- and does not use group-based convolution kernels. ing the EqPdeNet approach compatible with an end- to-end training through back-propagation within an auto- PDE-Based Neural Networks matic differentiation framework. Motivated by the universal approximation theorem (Hornik et al. 1989), neural networks have been used to approximate Invariance and PDE the solutions of PDE. A major work in this area is the in- By leveraging on the formalism introduced in (Olver 1993), troduction of the Physics Informed Neural Network (PINN) we give in the following some general background about in- approach (Raissi, Perdikaris, and Karniadakis 2019) as an variance theory for PDE. This will allow us to introduce the alternative to the usual finite difference methods. notion of differential invariants of a Lie group action, which (Chen et al. 2018) emphasizes that a residual neural net- is central to our work, and to explain how to build equivari- work can actually be seen as some discretization of an un- ant representations of input data by solving a specific type known Ordinary Differential Equation (ODE) and they show of PDE. how to efficiently learn the ODE parameters from the data by using adjoint techniques and classical ODE solvers. Build- Symmetry Group ing on similar ideas, (Ruthotto and Haber 2019) uses the ODE formulation of the neural network to introduce induc- Formally, we will see a PDE of order n in p independent tive bias, such as parabolic or hyperbolic properties to en- variables x = (x1 , ..., xp ) ∈ X and one dependent variable force respectively robustness to perturbations and low mem- u = u (x1 , ..., xp ) ∈ U as an equation involving x, u and ory usage. The use of differential equation formulation to uα = ∂α u, for α ∈ Nk , k ≥ 0 and |α| ≤ n. A PDE solution embed desired properties into the neural network is a com- will be of the form u = f (x). mon feature with our work. However the question of sym- In the following, we denote by X = Rp , with coordinates metry is not considered in this work. (x1 , ..., xp ) , the space of the independent variables and by The use of PDE has also appeared useful for build- U = R, with coordinates u, that of the dependent variable. ing equivariant structures. (Shen et al. 2020) introduces an Let’s then consider a Lie group G of dimension m acting as equivariant kernel to the isometry group SE(2) from differ- g. (x, u) on a sub-manifold M ⊆X × U, with its Lie algebra ential operators approximated through usual kernel convo- g generated by the vector fields ζ1 , ..., ζm . For instance, G lutions. In (Smets et al. 2020), a neural network equivariant could be the 2-dimensional rotation group SO(2) acting on to a generic transitive Lie group action is proposed by using X × U ' R2 with the infinitesimal generator ζ1 = −u∂x + several layers of equivariant PDE, the training of the algo- x∂u . rithm consisting in finding the parameters of the PDE. Less We can define the transform of a function u = f (x) un- recently, but closer to the present work, (Fang et al. 2017) der the action of G by identifying f with its graph Γf = have used a single PDE to extract equivariant features for a {(x, f (x)) , x ∈ Ω ⊆ X } ⊆ X × U and by defining g.f = linear classifier by leveraging on differential invariants the- fg , where the function fg is the function associated with the ory. transformed graph g.Γf defined as it follows for g ∈ G: The hybrid approach we are proposing is applicable to g.Γf = {g. (x, f (x)) , (x, f (x)) ∈ Γf } = Γfg (1) any Lie group action provided that a generating set of dif- ferential invariants can be efficiently computed and it allows These notions of transformed function and group action on for several group actions to be considered simultaneously. functional graphs are illustrated on Figure 1 where the graph Examples We have chosen to work with image classifica- tion to illustrate our approach and we have considered the action of the 2 dimensional special euclidean group SE(2) and that of the group ΛR∗+ (2) of the scaled translations on X ⊆ R2 (p = 2), which can be seen as actions on X × U by considering a trivial component for the U part. Using the infinitesimal invariance criteria allows writing corresponding sets of differential invariants as it follows: u,   Figure 1: Action of an element g in the 2-dimensional rota-   u2x + u2y ,       tion group SO(2) on the graph Γf of a function f : R → R SE(2) ∂φu,2 = uxx + uyy , (4) 2 2    ux uxx + 2u x uy uxy + uy uyy ,    u2xx + 2u2xy + u2yy   of a function f : R → R (left) is transformed according to the action of a group element g ∈ SO(2) by simple rotation ( ) (right). ΛR∗ u2x u2x u2x u2y u2y u2y With this formalism, a symmetry group G of the consid- ∂φu,2+ = , , , , , (5) uxx uxy uyy uxx uxy uyy ered PDE is a group G acting of M ⊆ X × U in such a way that if f is a solution, then its transformed fg by the group action is also a solution. Equivariant Representations A map ψ : A → B is said to be equivariant with respect Differential Invariants We call n−order jet space J (n) to the action of a group G if ψ (g.a) = g.ψ (a), ∀a ∈ A the cartesian product between the space of the independent and ∀g ∈ G. Leveraging on the differential invariants theory variables X and enough copies of the space of the dependent previously introduced, we build from d ∈ I representations variable U to include coordinates for each partial derivative which are equivariant to the action of a given group G, where of order less or equal than n: I refers to the input space. To do so, we consider that a data point d ∈ I can be J (n) = X × U × .... × U (2) represented by the graph of a function fd from X to U, so | {z } (p+n that d = {(x, fd (x)) , x ∈ X } . With this formalism, a gray n ) scale image such as one of the ROTMNIST samples consid- In the above definition, the binomial coefficient p+n ered in our numerical experiments can be represented by the  n cor- responds to the number of partial derivatives of the function graph of the function associating each position to its pixel f (assumed to be smooth enough) with order less or equal value. than n. A function f : X → U represented as u = f (x) can Following similar ideas to (Fang et al. 2017) and (Smets naturally be prolonged to a function u(n) = f (n) (x) from X et al. 2020), we model the representation learning process by the following PDE: to J (n) by evaluating f and the corresponding partial deriva- tive, so that u(n) = {uα , |α| ≤ n}.  ∂t u = F ∂φG  u,n According to the considered formalism, a generic PDE (6) ut=0 = fd could therefore be written as it follows,   F is a function from  the set of the differential invariants to ∆ x, u(n) = 0 (3) R and F ∂φG u,n is therefore also a differential invariant, any function of the differential invariants being a differential where ∆ is an operator from the n−order jet space J (n) invariant itself. to R. We then denote by pr(n) G the prolongation of the It therefore means that for g ∈ G, g.uT will also be a solu- group action of G to J (n) for which a prolonged transform tion so that the learned representation of the data is actually g (n) , for g ∈ G, sends the graph Γf (n) onto Γ(g.f )(n) , and equivariant with respect to the action of G in the sense that g.uT (fd ) = uT (g.fd ), where uT (f0 ) corresponds to the by pr(n) ζ1 , ..., pr(n) ζm the corresponding prolonged vector solution of (6) with initial condition f0 . Hence, as illustrated fields. in Figure 2 in the case of SE(2), diffusing the PDE (6) al- In the following, we will be interested in operators ∆ as- lows for extracting similar representations (features maps on sociated with the PDE having G as a symmetry group. These the right) from the inputs fd (upper left) and g.fd (lower operators are called the differential invariants of the action left). of G and are the algebraic invariants of the prolonged group Different functions F of the differential invariants lead action pr(n) G, for n ≥ 0. They can be obtained by leverag- to different equivariant representations by diffusing the cor- ing on the infinitesimal invariance criteria pr(n) ζi ∆ = 0 for responding PDE. As equivariance only is not enough for a i = 1, ..., m (Olver 2016) and (Hubert 2009). representation to be discriminative (e.g., black areas in the A set of differential invariants of order n will be generi- corner of the MNIST samples), we will then use a learn- cally denoted by ∂φG u,n in the sequel. ing approach to identify the representations conveying some Figure 3: EqPdeNet structure combining a first equivari- ant PDE layer, a dimension reduction layer and deeper fully connected layers Figure 2: Extraction of an SE(2)- equivariant representa- ! nt tion of an MNIST sample I0 with the heat equation ∂t u = Y uxx + uyy , ut=0 = I0 . The equivariant property makes the min L {ỹdi , ydi } (8) θ1 ,...,θne ,ω,δ associated diagram commutative, for g ∈ SE(2). i=1 t n where the (fdi , ydi )i=1 refers to the training samples. meaningful information about the input data through the in- PDE Integration ference of the function F . More precisely, we will assume that F belongs to a para- In order to efficiently find some numerical approximations to metric space, so that F = Fθ , for θ ∈ Θ ⊆ Rk . In the fol- the equivariant PDE and to train the entire architecture from lowing, F will be chosen to be linear as in (Fang et al. 2017) end-to-end, we propose an integration method compatible or more generally, as a multivariate polynomial in the dif- with the backpropagation technique within some automatic ferential invariants. The corresponding vectorial parameter differentiation frameworks such as TensorFlow or PyTorch. θ will be part of the trainables parameters of our approach. Convolution Approach Following (Ruthotto and Haber In the following, we will denote uθT the representation ex- 2019) and (Long et al. 2018), our approach consists in ap- tracted by solving the PDE (6) with F = Fθ . The explicit proximating the PDE integration operator with some usual reference to the initial condition is made by writing uθT (fd ) convolution layers built from well chosen kernels. More pre- when needed but it will be generally dropped to ease the ex- cisely, we consider the explicit Euler discretization of (6), position. which we write as it follows: An Hybrid Approach  = u` + ∆t × Fθ ∂φG  u`+1 u,n We introduce in this section a generic hybrid approach com- (9) bining the previously introduced PDE based equivariant rep- u0 = fd resentations learning with some fully connected feed for- where u` = u`∆t , ∆t > 0 is a discretization parame- ward layers, following the intuition behind of the hierarchi- T  ter and 0 ≤ ` ≤ `T , with `T = ∆t . We then consider cal structure of usual CNN. that each iteration of the above Euler scheme corresponds EqPdeNet Structure  network with input u` and outputs to a layer of a neural u` + ∆t × Fθ ∂φG u . Our approach called EulerConv (Fig- We introduce the EqPdeNet structure depicted with 2- ure 4) then consists in implementing this layer by approxi- dimensional data on Figure 3, in which ne PDE are used mating the differential operator ∂α required for building the θ to extract the equivariant representations uθT1 ,...,uTne . A di- differential invariants ∂φGu with some appropriate convolu- mension reduction layer (e.g., pooling, linear combination, tion filters. etc.) is then combined with deeper fully connected layers to For each differentiation index α, it is therefore possible to produce the outputs. An output ỹd ∈ Y of EqPdeNet cor- write responding to the input fd is then computed according to the ∂α U` = Kα ? U` (10) following formula:    where U` is a tensor referring to a discretization of u` over θ the domain X , Kα is a constant convolution kernel and ? is ỹd = Nω φδ uθT1 (fd ) , .., uTne (fd ) (7) the discrete convolution operator (Differential convolution where Nω refers to the prediction function of the fully con- layer on Figure 4). The differential invariants can then be nected layers with weights ω and δ to the parameters of the obtained from the values ∂α U` which correspond to the ap- n dimension reduction layer. Denoting by L : (Y × Y) t → proximate values of the differential ∂α u` , computed by finite R, for nt ≥ 1, a relevant loss function for the considered differences through the convolution kernel Kα (Differential learning task, the training of the algorithm therefore consists invariants layer on Figure 4). The corresponding output (Up- in finding an approximate solution to the following mini- date layer on Figure 4) is the result of one step of the Euler mization problem, scheme (9). The unit allowing to perform the entire Euler EqPdeNet #param Test iso sca 13033 70.7(2.7) 37.3(1.6) 23.1(1.3) 26537 77.9(0.6) 39.6(1.3) 24.2(1.1) 55081 82.7(0.7) 43.8(0.9) 25.8(1.4) 118313 86.8(0.6) 48.8(1.1) 28.2(1.0) 269353 89.9(0.8) 53.0(0.9) 30.2(1.1) Figure 4: EulerConv unit allowing to perform one step in the explicit Euler scheme by approximating the differential Table 1: Accuracy of the EqPdeNet network in several operators ∂α with appropriate convolution kernels scenarios after training on the original ROTMNIST training samples cal system with symmetries (Noether’s theorem) as long as a generating set of differential invariants can be efficiently computed for the corresponding symmetry group. Following the line of existing work with respect to the Figure 5: ConvInt unit allowing to integrate the PDE testing of equivariant algorithms, we have structured our using an explicit discretisation scheme using several numerical from the ROTMNIST dataset and we emphasize EulerConvunits here that we did not use any kind of data augmentation tech- nique for the training step. More precisely, the ROTMNIST dataset was built in (Larochelle et al. 2007) from the orignial scheme is referred to as ConvInt (Figure 5) and is a con- MNIST digits by applying to the original samples random catenation of EulerConv layers for the appropriate number rotations with angles sampled uniformly in [0, 2π]. In the of time steps. following, algorithms have been trained on the 12k training About Numerical Accuracy The above convolution ap- samples and all results have been obtained with a Tensor- proach to the PDE integration can actually be seen as a Flow based implementation of our approach running on a specific explicit finite difference scheme, therefore raising GeForce RTX 2080 Nvidia card. some natural questions about consistency, stability and con- We have used a EqPdeNet network with a PDE layer vergence. Even for simple choices of the function Fθ , the aiming to build equivariant data representations with respect theoretical analysis of this scheme is not an easy task to per- to the translation group, and either the rotation or the scal- form due to the strong non-linearity introduced by the dif- ing group. More precisely, the PDE layer includes two PDE SE(2) ferential invariants and we therefore defer it to some further built from the differential invariants ∂φu,2 and two others work. ΛR∗ It is however possible to comment on some practical tools combining those of ∂φu,2+ , whose outputs are then linearly that can be used to control the numerical accuracy of the dis- combined. cretization scheme. Considering the time dimension only, it To illustrate the benefits of our approach when compared ∆t→0 to corresponding fully connected neural networks (FCNN) holds that ku`∆t − ut k = o (∆t) so that we can make from both accuracy and robustness standpoints, we have the time discretization error arbitrarily small by decreasing built several scenarios from the original ROTMNIST testing the parameter ∆t and adding some more EuleurConv units set, namely accordingly in the ConvInt units. With respect to the space dimension, the discretization error can be controlled by in- • iso: a random isometry, i.e. a combination of a random creasing the number of sampling points when building U` translation of (th , tv ) pixels and a random rotation of θ from u`∆t . degrees, where th ∼ U (−2, 2), tv ∼ U (−2, 2), and In some practical situations such as image processing θ ∼ U (−30, 30), is applied to each of the original testing tasks, the input data lies in a discrete manifold so that the samples. smooth functional representation that was previously intro- • sca: a random scaling transform x, y → (λx, λy) with duced does not directly apply. In this case, interpolation parameter λ ∼ U 32 , 1 is applied to each of the original methods such as the functional convolution can be used to testing samples. obtain some continuous inputs (Simard et al. 1998). where U (a, b) refers to the uniform distribution on the interval[a, b]. Numerical Experiments The accuracy results in each scenario obtained after av- We provide in this section the results of numerical experi- eraging over 10 instances of testing to smooth out the sta- ments we have conducted on the 2-dimensional problem of tistical noise are given in tables 1 and 2, together with the image classification. However, as generically applicable to corresponding standard deviation as subscripts. symmetric learning tasks on smooth functional data, our ap- We see that the accuracy on the testing set is consis- proach is not specific to image classification and could for tently higher with our approach than with the corresponding instance be instantiated to predict the evolution of a physi- FCNN, for all the considered numbers of parameters. With FCNN ral network instead of a multivariate polynomial for the pa- #param Test iso sca rameterization of the differential invariants through the func- 13002 65.5(1.5) 36.7(1.4) 23.5(0.6) tion Fθ may help improving the expressiveness of our ap- 26506 73.4(0.9) 37.9(1.2) 23.4(0.7) proach. 55050 80.8(0.5) 42.4(1.0) 25.0(1.2) 118282 85.7(0.4) 47.0(0.8) 26.9(0.9) References 269322 87.7(0.5) 49.5(0.6) 28.3(0.8) Baldi, P.; Sadowski, P.; and Whiteson, D. 2014. Searching for exotic particles in high-energy physics with deep learn- Table 2: Accuracy of FCNN in several scenarios after train- ing. Nature communications 5(1): 1–9. ing on the original ROTMNIST training samples Bekkers, E. J. 2019. B-spline cnns on lie groups. arXiv preprint arXiv:1909.12057 . respect to robustness, higher accuracies are reached with our Carrasquilla, J.; and Melko, R. G. 2017. Machine learning approach in the iso and sca testing scenarios as the number phases of matter. Nature Physics 13(5): 431–434. of parameters increases, consistently with the increase of the Chen, R. T.; Rubanova, Y.; Bettencourt, J.; and Duvenaud, overfitting risk. D. K. 2018. Neural ordinary differential equations. In Hence, although less performant than using G-CNN Advances in neural information processing systems, 6571– which are able to achieve a testing accuracy of almost 6583. 99% with SE(2) equivariance (Finzi et al. 2020) because Cohen, T.; and Welling, M. 2016. Group equivariant convo- of a simpler structure and approximate equivariance, our lutional networks. In International conference on machine EqPdeNet approach does provide material improvements learning, 2990–2999. with respect to the usual FCNN, from both accuracy and ro- Cohen, T. S.; Geiger, M.; Köhler, J.; and Welling, M. 2018. bustness standpoints. Spherical cnns. arXiv preprint arXiv:1801.10130 . Esteves, C.; Allen-Blanchette, C.; Makadia, A.; and Dani- Conclusions and Further Work ilidis, K. 2018. Learning so (3) equivariant representations In this paper we proposed an hybrid architecture with a first with spherical cnns. In Proceedings of the European Con- PDE based layer made equivariant to generic group actions ference on Computer Vision (ECCV), 52–68. by leveraging on differential invariants theory. This struc- Fang, C.; Zhao, Z.; Zhou, P.; and Lin, Z. 2017. Feature learn- ture allows achieving simultaneous approximate equivari- ing via partial differential equation with applications to face ance with respect to several group actions by aggregating the recognition. Pattern Recognition 69: 14–25. learned inner representations through a dimension reduction layer feeding deeper fully connected layers. Finzi, M.; Stanton, S.; Izmailov, P.; and Wilson, A. G. 2020. In order to make the approach practical, we have specified Generalizing convolutional neural networks for equivariance an end-to-end training method compatible with the usual au- to lie groups on arbitrary continuous data. arXiv preprint tomatic differentiation frameworks in which the numerical arXiv:2002.12880 . approximations to the several PDE solutions are obtained Gens, R.; and Domingos, P. M. 2014. Deep symmetry net- through the use of fixed weights convolution operators. works. In Advances in neural information processing sys- We have performed some numerical testing on the tems, 2537–2545. ROTMNIST dataset and have shown the superiority of our Hornik, K.; Stinchcombe, M.; White, H.; et al. 1989. Mul- approach from both accuracy and robustness standpoints, tilayer feedforward networks are universal approximators. when compared to fully connected neural networks. Our re- Neural networks 2(5): 359–366. sults are however below those reported for G-CNN as our Huang, Z.; Wan, C.; Probst, T.; and Van Gool, L. 2017. Deep approach is simpler and does not ensure strict equivariance. learning on lie groups for skeleton-based action recognition. However, the PDE built from the differential invariants are In Proceedings of the IEEE conference on computer vision easier to interpret than group-based convolution kernels. and pattern recognition, 6099–6108. Although we believe the approach and our preliminary numerical results to be promising, additional work is needed Hubert, E. 2009. Differential invariants of a Lie group ac- for deriving rigorous rules with respect to hyperparameters tion: Syzygies on a generating set. Journal of Symbolic setting. In particular, a theoretical analysis of the conver- Computation 44(4): 382 – 416. ISSN 0747-7171. doi: gence of the discretization units for several Lie groups and https://doi.org/10.1016/j.jsc.2008.08.003. URL http://www. PDE types would be valuable. sciencedirect.com/science/article/pii/S0747717108001089. Finally, by leveraging on the interpretability feature of our Larochelle, H.; Erhan, D.; Courville, A.; Bergstra, J.; and approach, we plan to conduct analysis of the learned equiv- Bengio, Y. 2007. An Empirical Evaluation of Deep ariant representations as to refine the choice of the paramet- Architectures on Problems with Many Factors of Varia- ric form of the PDE to be considered, to study the oppor- tion. In Proceedings of the 24th International Confer- tunity to use partial specification techniques and to discuss ence on Machine Learning, ICML ’07, 473–480. New York, some safety and certification aspects. Also, as done in (Finzi NY, USA: Association for Computing Machinery. ISBN et al. 2020) to model convolution kernels, using a small neu- 9781595937933. doi:10.1145/1273496.1273556. LeCun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998. Worrall, D. E.; Garbin, S. J.; Turmukhambetov, D.; and Gradient-based learning applied to document recognition. Brostow, G. J. 2017. Harmonic networks: Deep translation Proceedings of the IEEE 86(11): 2278–2324. and rotation equivariance. In Proceedings of the IEEE Con- Long, Z.; Lu, Y.; Ma, X.; and Dong, B. 2018. PDE-Net: ference on Computer Vision and Pattern Recognition, 5028– Learning PDEs from Data. volume 80 of Proceedings of Ma- 5037. chine Learning Research, 3208–3216. Stockholmsmässan, Xiong, W.; Droppo, J.; Huang, X.; Seide, F.; Seltzer, M.; Stockholm Sweden: PMLR. URL http://proceedings.mlr. Stolcke, A.; Yu, D.; and Zweig, G. 2016. Achieving human press/v80/long18a.html. parity in conversational speech recognition. arXiv preprint Marcos, D.; Volpi, M.; and Tuia, D. 2016. Learning rotation arXiv:1610.05256 . invariant convolutional filters for texture classification. In Zaheer, M.; Kottur, S.; Ravanbakhsh, S.; Poczos, B.; 2016 23rd International Conference on Pattern Recognition Salakhutdinov, R. R.; and Smola, A. J. 2017. Deep sets. In (ICPR), 2012–2017. IEEE. Advances in neural information processing systems, 3391– Olver, P. 1993. Applications of Lie Groups to Differential 3401. Equations. New York, NY, USA: Springer-Verlag. Olver, P. 2016. Equivariant Moving Frames for Euclidean Surfaces. Oyallon, E.; and Mallat, S. 2015. Deep roto-translation scat- tering for object classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2865–2873. Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2019. Physics-informed neural networks: A deep learning frame- work for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Compu- tational Physics 378: 686–707. Raissi, M.; Yazdani, A.; and Karniadakis, G. E. 2020. Hid- den fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science 367(6481): 1026–1030. Ruthotto, L.; and Haber, E. 2019. Deep Neural Networks Motivated by Partial Differential Equations. Journal of Mathematical Imaging and Vision 62: 352–364. Shen, Z.; He, L.; Lin, Z.; and Ma, J. 2020. PDO-eConvs: Partial Differential Operator Based Equivariant Convolu- tions. arXiv preprint arXiv:2007.10408 . Simard, P.; LeCun, Y.; Denker, J. S.; and Victorri, B. 1998. Transformation Invariance in Pattern Recognition- Tangent Distance and Tangent Propagation. In Neural Net- works: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop, 239–27. Berlin, Heidelberg: Springer- Verlag. ISBN 3540653112. Smets, B.; Portegies, J.; Bekkers, E.; and Duits, R. 2020. PDE-based Group Equivariant Convolutional Neural Net- works. arXiv preprint arXiv:2001.09046 . Szegedy, C.; Ioffe, S.; Vanhoucke, V.; and Alemi, A. A. 2017. Inception-v4, inception-resnet and the impact of resid- ual connections on learning. In Thirty-first AAAI conference on artificial intelligence. Van Nieuwenburg, E. P.; Liu, Y.-H.; and Huber, S. D. 2017. Learning phase transitions by confusion. Nature Physics 13(5): 435–439. Weiler, M.; Hamprecht, F. A.; and Storath, M. 2018. Learn- ing steerable filters for rotation equivariant CNNs. In Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 849–858.