1. Introduction

1613-0073

Neural Networks⋆

Xue-Cheng Tai

xtai@norceresearch.no 3

Hao Liu

haoliu@hkbu.edu.hk 0

Raymond H. Chan

raymond.chan@ln.edu.hk 2

Lingfeng Li

Workshop

0 Department of Mathematics, Hong Kong Baptist University , Kowloon Tong, Kowloon , Hong Kong SAR 1 Hong Kong Centre for Cerebro-Cardiovascular Health Engineering , Hong Kong SAR 2 Lingnan University , Tuen Mun , Hong Kong SAR 3 Norwegian Research Centre , Nygårdsgaten 112, 5008 Bergen , Norway

In this work, we propose a general framework for designing neural network architectures inspired by dynamic diferential equations, utilizing the operator-splitting technique. The central idea is to treat neural network design as a discretizations of a continuous-time optimal control problem, where the underlying dynamics are governed by diferential equations serving as constraints which is then unrolled as our network. These dynamics are discretized through operator-splitting schemes, which allow complex evolution equations to be decomposed into simpler substeps. Each step in the splitting scheme is then unrolled and interpreted as a layer in a neural network, with certain control variables modeled as learnable parameters. This formulation provides a principled way to incorporate prior knowledge about dynamics and structure into the network design. Using our theory, we give a rigorous mathematical explanation of the well-known UNet and show that it is a discretizations of a simple diferential equation. By adding regularization to UNet, we can derive the PottsMGNet also through our proposed framework.

deep neural network control problem operator splitting UNet image segmentation

1. Introduction

(L. Li) https://raymondhonfu.github.io/ (R. H. Chan) https://www.norceresearch.no/personer/xue-cheng-tai (X. Tai); https://www.math.hkbu.edu.hk/~haoliu/ (H. Liu);

CEUR

ceur-ws.org image segmentation across noise levels via a unified framework grounded in multigrid and control perspectives. Notably, their analysis revealed that many encoder-decoder networks implicitly implement operator-splitting algorithms for control problems. Further advancing this interplay, Liu et al. [20] proposed a double-well network that learns region force terms in the Potts model through neural representations. We also want to mention that the work [21] has provided another way to interpret neural networks through multigrid techniques.

In this work, we aim to present a general framework for designing neural networks based on dynamic diferential equations using operator-splitting technique. The basic idea is discretizing a continuous control problem using the dynamic system as a constraint which is then approximated by some operatorsplitting schemes. We unroll this splitting scheme as a neural network with some control variables parameterized as learnable modules. We also illustrate this approach through examples of the UNet and the PottsMGNet.

2. The Central Ideas

A central insight of our work is that deep neural networks (DNN) can be naturally derived as discretizations of continuous control problems governed by partial diferential equations. This perspective provides a unifying framework to design and interpret network architectures through dynamical systems and optimal control.

Consider a continuous control problem where the state evolution is described by a PDE of the form: = (, ) for (x, ) ∈ Ω × [0, ], (0) = 0 in Ω, ( 1 ) where Ω is the spatial computational domain, is a diferential operator parameterized by the control variable which could depends on both x, or one of them. The function 0 is the initial condition. By properly discretizing this PDE in time by operator-splitting methods and unrolling the dynamics over discrete steps, we can derive a numerical scheme for solving ( 1 ). It is shown that this scheme has the same architecture of a network, for which the control variables plays the role of network learnable parameters. Based on this scheme, one optimizes control variables to minimize a given cost functional so that ( )

matches the target state. The optimization procedure is the same as network training.

This is a general framework that applies to all networks. From another perspective, it also implies that for any given network, we can find a corresponding control problem so that this network is a numerical scheme (with proper discretization and unrolling) solving this problem. This framework ofers several advantages: • Theoretical Analysis: Tools from PDE theory (e.g., stability, convergence, and regularity analysis) can be applied to study the behavior of DNNs. • Architecture Design: New network architectures can be derived by choosing appropriate PDE models and discretization schemes. It enables to equip the designed neural networks with diferent kinds of physical properties which are often desired, but have not been able to achieve with existing networks. • Scalability, Interoperability and Explainability: The continuous viewpoint provides insights into the role of depth, initialization, and parameterization in DNNs.

In the following sections, we give details of this idea and demonstrate how to use operator-splitting methods and unrolling to discretize the control problems. Applications of this framework on UNet [1] and PottsMGNet [18] will be discussed.

3. Discretization by Operator-Splitting Methods

Assume that we have decomposed the diferential operator in the following form: =1 = ∑ (, ), (0) = 0 where each (, ) is possibly nonlinear and acts as a distinct operator. Operator-splitting methods approximate the full solution by evolving through each operator sequentially or in parallel over a time step . For example, the sequential splitting [22, 23] evolves the solution by applying the sub-solvers sequentially: where , ()̂ = ( + ) denotes the solution operator of +1 = , ∘ ⋯ ∘ 1,

( ), = (, ), ( ) = . ̂ +1 = 1 ∑ =1 , ( ).

The solution operator

, splitting scheme [24] solves each sub-problem in parallel and combines them together ()̂ can either be solved explicitly or approximated numerically. The parallel We may also apply the sequential and parallel schemes together to form hybrid splitting schemes as explained in Appendix A, see also [18, Appendix D] for some more details.

After splitting, unrolling methods are used to construct neural networks. Unrolling maps the splitting scheme to a neural network architecture, where each operator becomes a network layer. Specifically, we have the following construction: 1. Layer structure: Each time step → +1 is unrolled into sub-layers, one per operator . If is known (e.g., a Laplacian), we implement as a fixed layer. If contains unknown learnable parameters, we represent as a trainable neural network block.

2. Parameterization: Replace unknown with learnable modules

(, ) with parameters ().

The operator becomes a learnable layer 3. Unrolled network: By replacing with , model Θ with learnable parameters Θ = { ( )}

=1,=1 . , ,

, , in the splitting scheme, we get a neural network

This unrolled network can be trained by minimizing the discrepancy between predictions from a given initial condition 0 and the corresponding ground-truth solutions (e.g., MSE): ̂ =1 ℒ (Θ) =∑ ‖ Θ( 0) − ‖2 .

where { 0, =̂ 1 is a set of training samples. Let us emphasis that we can use other loss functions } instead of MSE, such as the cross-netropy loss. Diferent deep neural networks in the literature are just diferent approximations of some specially chosen PDEs or ODEs. In the following section, we will show that the well-known UNet [1] is just an unrolling of a numerical approximation of a special PDE, see ( 2 ).

4. UNet as Multigrid Operator Splitting

The well-known deep neural network UNet [1] is designed to segment images into foreground and background represented by a binary image. By applying the framework in Section 2-3, we show that it is just a numerical approximation of a PDE.

4.1. The control problem

Given an input image defined on the image domain Ω, we consider the following initial value problem { ( x, 0) = ( ( x)),x ∈ Ω, ( x,) = ( x, ) ∗ ( x, ) + () − ln ( x,)

, (x, ) ∈ Ω × (0, ], 1−( x,) ( 2 ) where ∗ denotes convolution, ( x, ), () are control variables which will be learned in the training process. The convolution kernel ( x, ) ∶ × (0, ] → ℝ is supported on some spatial domain ⊂ ℝ 2. In practice, can be diferent from Ω, and is usually a small region. In this paper, for simplicity, we take = Ω = [−1, 1] 2. When computing convolution, we extend functions to ℝ2 by padding convolution kernels ( x, ) by 0 and solution function ( x, ) periodically.

Equation ( 2 ) evolves any input image into a probability in (, ) ∈ [0, 1] , see [18, Appendix A] and [25] for some more details with the derivation of this equation. Above, ( ) is some operation to generate initial condition from . Due to the appearance of the term ln 1− , the solution of the above equation is forced to be in ( 0, 1 ). For numerical consideration and to make the connection between operator-splitting methods and neural networks clearer, we introduce a constraint (, ) ≥ 0 to the control problem. Due to the property of the term ln 1− , the introduced constraint does not change the solution. Next, we incorporate the constraint into the equation by introducing an indicator function { ( x, 0) = ( ( x)),x ∈ Ω, − ( x, ) ∗ − () + ln 1− + ℐ Σ() ∋ 0, (x, ) ∈ Ω × (0, ], ( 3 ) where

Σ = { ∶ ( x, ) ≥ 0 for (x, ) ∈ Ω × (0, ]}, ℐΣ is the indicator function of Σ and ℐ Σ denotes the subdiferential of ℐΣ. By solving ( 3 ) for any input image , the initial value ( x, 0) will evolve to ( x, ) , which is a probability function. By choosing close to 1, we can force this probability function to be close to a binary function. For simplify of explanation, we will take = 1 and this is also the value the original UNet is using.

To solve ( 3 ) numerically, [26] decomposed control variables (learnable variables) { ( x, ), ()} in ( 3 ) using the multigrid idea. Here, we introduce a simplified version of their algorithm that can recover a simple UNet-like structure. The discussion can be generalized to recover the original UNet structure from ( 3 ).

4.2. Decomposition of control variables

In traditional multigrid methods, a popular framework is the ”fine grid → coarse grid → fine grid” strategy [27, 28]. Such a form of V-cycle multigrid method can be interpreted as space decomposition and subspace correction [29, 30].

Motivated by the fact that images can be viewed as piecewise constant functions in practice, in this paper, we consider piecewise constant approximation of diferent resolutions. Let > 0 be some integer. For the finest resolution, we consider discretizing into (2 )2 small squares, each of which is of size (2−+1 ) × (2−+1 ). They are called pixels in image processing and rectangular elements in finite elements methods. In real applications, the finest mesh is the grid the input image is given. Denote ℎ1 = 2−+1 and this set of pixles by 1. Starting with 1, we can define a sequence of coarse pixels { }+=11 . Specifically, let ℎ = 2−1 ℎ1 = 2− . The set consists of coarse pixels with size ℎ × ℎ that form a partition of . For each , we can define a set of piecewise constant basis functions. For the -th pixel in , denoted by , we define (x)such that it equals to 1 if x ∈ and equals to 0 otherwise. We denote the space spanned by this set of functions by . We have that +1 ⊂ ⊂ ⋯ ⊂ 1. ( 4 ) Note that each function in has 22(+1−) coeficients and dim( ) = 22(+1−) . Traditional multigrid methods solve the decomposed subproblems by simple Gauss-Seidel or Jacobi iterations. Here, the multigrids problems are used in a diferent way.

We will decompose { ( x, ), ()} into a sum of variables with diferent scales over the multigrids. Then, we use a hybrid splitting method to solve ( 3 ) so that all decomposed variables are distributed into several subproblems, which are solved sequentially or in parallel. Within one iteration of the splitting method, all decomposed variables are gone through. The general splitting idea is to split the operators for ( 8 ) with = 1 . based on a V-cycle according to the grid level, cf. Figure 1 for an illustration. We decompose all terms in the right-hand side of ( 3 ) via the following steps: 1. According to the idea of a V-cycle, we decompose ( x, ) and () as

( x, ) = ( x, ) + ( ̃x, ), () = () + (̃).

These variables will be further decomposed next. Above, , are sums of control variables in the left branch of the V-cycle, and , ̃ ̃are sums of the control variables in the right branch, see an illustration in Figure 1. We also decompose the nonlinear operator as follows: − ln

Here, () contains nonlinear operations in the left branch and (̃) contains nonlinear operations in the right branch. In particular, we put − ln in (̃) only, i.e., () = ℐ Σ() and (̃) = 1− UNet, in which the operation − ln ℐ Σ() − ln . Later, we will show that our operator splitting method recovers a simplified corresponds to the sigmoid layer at the end of UNet. 2. We further decompose the operators into components at diferent grids as: ( x, ) = ∑ (x, ), () = ∑ (), () =

∑ (), =1 ( 6 ) ( 8 ) ( ̃x, ) = ∑ ̃(x, ) + ∗(x, ), (̃) = ∑ ̃() + ∗(), (̃) = ∑ ̃() + ∗(), ( 7 ) applied to the intermediate solution on grid level . Operator ∗() = − ln where , , ̃ , ̃contain control variables at grid level , ∗, ∗ are control variables that are applied to the output of the V-cycle at the finest mesh. Operators () = ̃() = ℐ Σ() are is applied to the output of the V-cycle at the finest mesh. 3. At grid level in the left branch, is defined on , which contains 22(+1−) coeficients to be learned. This leads to a high complexity when is small. In order to decrease the complexity, we decompose into a sum of which is nonzero only on small patch of size (ℎ ) × (ℎ ), denoted by { } =1 , where is the total number of patches . Normally, we take = 3 or 5 and Thus we have all shall cover . Patches with = 3, = 1

is illustrated in Figure 1. Each equals to over if the support sets do not overlap. We also decompose and into a sum of components.

(x, ) = ∑ (x, ), = ∑ (), = ∑

(). =1 =1 =1 we have tered at (0, 0)is a shifted version of Note that ’s have diferent supports, making specifying the support for each function complicated. In order to simplify the settings, we shift for each , let be a shifting kernel so that = so that they are centered at (0, 0). Specifically, ∗ ̂ where ̂ supported on a square cen . According to the shift-invariant property of convolution, (x, ) ∗ ( x, ) = ∑ (x, ) ∗ ( x, ) = ∑(

(x, ) ∗ ̂ (x, )) ∗ ( x, ) × Thus learning a kernel of with 22(+1−) coeficients is converted to learning kernels ̂ (x, ) with coeficients arond the origin. In implementation, corresponds to the number of channels in a CNN and { }=1 corresponds to CNN convolution kernels of size × . In our experiments, for simplicity, we omit the shifting operator and replace ( (x, ) ∗ ( x, )) by ( x, )) and it gives good results. This is also what is done in the original UNet. The same decomposition is done for ̃, ̃and ̃. 4. For each grid level and each channel , we further decompose

̂ (x, ) = ∑ , (x, ), and , (x, ) has support around the origin with × position is to increase the number of parameters for the training variables. In our algorithm, each channel will compute an intermediate result. In the computation, , is used to convolve with the intermediate solution in the -th channel of grid level − 1 . The same decomposition is coeficients. The purpose of this decomBefore the decomposition, the control variables are ( x, ), () . After these decompositions, we see that the control variables are decomposed as in the following and the control variables are now

, (x, ), (), ̃, (x, ), ̃(),

∗, (x, ), ∗(),: ( x, ) = ∑ ∑

∑ , (x, ), ( ̃x, ) = ∑ ∑

∑ ̃, (x, ) + ∑ ∗(x, ), −1 =1 =1 =1 =1 =1 and the operators (), (̃) are decomposed as: ( x, ) = ∑ ∑ (x, ), (̃x, ) = ∑

∑ ̃(x, ) + ̃∗(x, ), () =

∑

∑ =1 =1 (), (̃) = with () = ̃() = ℐ Σ(), ∗() = − ln ̃() + ∗()

The original PDE ( 2 ) is turned to

{ ( x, 0) = ( ), x ∈ Ω.

= ∗ + ∗̃ + +

+̃ () + (̃), ( x, ) ∈ Ω × [0, ], To solve ( 15 ), we use a hybrid splitting method shown in Appendix A. Divide the time interval [0, ] into subintervals with time step = /

. Denote the computed solution at time = by . The resulting algorithm that updates to +1 is summarized in Algorithm 1, where and represent the downsampling and upsampling operator respectively. For simplicity, variable dependencies on x are omitted.

where ReLU() = max{, 0̄} is the rectified linear unit.

Problem ( 19 ) can be written as

= max{, 0̄} = ReLU()̄, − ∗ = ∗̂ ∗ + −̂ ln

Data: The initial condition solution .

Compute ∈ by solving end for Compute = 1 ∑

=1 .

end for 4.3. On the solution to ( 16 ), ( 18 ) and ( 19 ) Observe that ( 16 ) and ( 18 ) are in the form of − ∗ =1 +1 − 1 − ∗( ) ∗ 1 − ∗( ) − ∗( +1 ) ∋ 0.

− ∑ ̂ ∗ ∗ − +̂ ℐ Σ() ∋ 0, where is some constant, ∗ = 1 ∑

=1 ∗ is known, ̂ is a convolution operator, is the number of input channel, ̂ is a bias function. The solution to ( 20 ) is computed by a two-sub-step splitting method: { − ̄ + ℐ Σ() ∋ 0.

=̄ ∗ + (∑=1 ̂ ∗ ∗ + ̂ ), sub-step, it is, in fact, a projection. Its closed-form solution is given as In ( 21 ), there is no dificulty in solving for ̄ in the first sub-step as it is an explicit step. Let us emphasis that the term ∑ =1 ̂ ∗ ∗ + ̂ in this step is exactly the Pytorch function Conv2d. For u in the second The first sub-step is an explicit step using Pytorch function Conv2d. We solve the second sub-step approximately by a fixed point iteration. Initialize 0 = .̄ Given , we update +1 by solving for which we have the closed-form solution

1 where Sig() = 1+ − is the sigmoid function. By repeating (26) so that +1 converges to some function ∗, we set = ∗. In particular, since 0 = ,̄ the updating formula (26) always gives 1 = 0.5. If we only consider a two-step fixed point iteration, we get = Sig (− 0.5 − ̄

) = Sig ( −̄ 0.5 ) .

4.4. Algorithm 1 recover the UNet

We first show that a building block of Algorithm 1 is equivalent to a layer of a simplified UNet. Each layer of UNet is a convolution layer activated by ReLU:

{ − ̄ = − ln .

1−

=̄ ∗ + (∑=1 ̂ ∗ ∗ + ̂ ), − ̄

= − ln +1 = Sig (−

+1 1 − +1 − ̄

) , { =̄ = ReLU( )̄ , ∑ =1 ∗ ∗ + , =1 =1 where is a convolutional kernel and is the bias. In Algorithm 1, the building block is ( 20 ) and (23), which is solved by ( 21 ) and (24). In fact, (28) (or problem (24)) and ( 21 ) have the same form. Specifically, in the first equation of ( 21 ) we have =̄ ∗ + (∑ ̂ ∗ ∗) + ̂= ∑ (1/ + ̂ ) ∗ ∗ + ,̂ where 1 denotes the identity kernel satisfying 1 ∗ = for any function . In (28), set = 1/ + ̂ , = .̂ Following the steps for solving ( 16 ) and ( 18 ) above, we solve (23) as , (24) (25) (26) (27) (28) (29) (30) level ) in Algorithm 1.

in Algorithm 1.

We have =̄ ̄ , and = . Essentially, Algorithm 1 and UNet are the same. Thus, we have shown that a simplified UNet structure (with only 1 convolution layer at each data resolution) is equivalent to one iteration of Algorithm 1. UNet architecture consists of four components: encoder, decoder, bottleneck, and skip-connections, each of which has a corresponding component in the structure of Algorithm 1: 1. Encoder: Encoder in UNet corresponds to the left branch of the V-cycle in Algorithm 1. The number of data resolution levels corresponds to the number of grid levels . 2. Decoder: Decoder in UNet corresponds to the right branch of the V-cycle in Algorithm 1. 3. Bottleneck: Bottleneck in UNet corresponds to the computations at the coarsest grid level (grid 4. Skip-layer connection: Skip-layer connections in UNet correspond to the relaxation steps ( 17 ) Therefore, a one-step operator splitting of the control problem ( 15 ) is exactly equivalent to a simplified UNet.

Algorithm 2: A hybrid splitting method to solve the control problem ( 15 ) for PottsMGNet Data: The initial condition solution at time step .

Compute ∈ by solving end for +1 − 1̄ − ∗( ) ∗ 1̄ − ∗( ) − ∗( +1 ) ∋ 0.

5. Case study: PottsMGNet

above given split of operators.

The second substep is then defined as In the previous section, we presented how to use the framework introduced in Section 2-3 to show that the operator splitting scheme of ( 15 ) is equivalent to a UNet. In this section, we show that PottsMGNet proposed in [18] is another instance of this framework. Specifically, we will consider a modified control problem of ( 15 ) and a modified splitting scheme of Algorithm 1.

5.1. Constructing a PottsMGNet

We consider the control problem ( 15 ) with decomposition ( 5 )-( 10 ). Here, the decomposition of the nonlinear operator and ̃as in ( 14 ) are defined as () = − (2−1 )−1 ln

, ∗() = − 1 ln

variance 2. To solve this new system, we consider a multi-step operator splitting algorithm (Algorithm 2) using the hybrid splitting method in Appendix A for approximating the solution of ( 15 ) with the When solving the subproblems (31) and (32), we adopt a similar sequential splitting method as ( 21 ). = (ℐ − ) −1()̄, which can be approximately solved by a fixed point iteration as for the second substep in ( 24). By unrolling Algorithm 2, we can get a simplified PottsMGNet [ 18].

5.2. Comparison with UNet

Compared with the UNet structure, the PottsMGNet starts with a control problem with a diferent nonlinear term and ,̃which results in a diferent activation function at each layer of the resulted neural network. The position of the relaxation step is also diferent, which leads to a diferent skip-connection structure. Additionally, the UNet is a one-step algorithm while the PottsMGNet is a multi-step algorithm. Similar to UNet, we can also extend Algorithm 2 to a multi-channel case by further split the control variables.

6. Conclusion

In this work, we present a framework to design novel neural networks using operator splitting of some control problems. Networks designed in this way are usually robust to disturbances in the input, because concerned control problems usually include some regularization terms. We also presented how to derive UNet and PottsMGNet from this framework. In the future, we would apply this framework to other types of networks like graph neural networks.

Declaration on Generative AI

During the preparation of this work, the authors used DeepSeek in order to: Grammar and spelling check. After using these tools, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content.

Appendix

A. Hybrid Splitting Schemes for Initial Value Problems and Neural

Network Design

We discuss a hybrid splitting scheme proposed in [18] as a powerful technique for solving initial value problems, which can be efectively leveraged to design and understand the architecture of deep neural networks. A hybrid splitting scheme combines the principles of parallel and sequential splitting schemes to decompose complex problems into more manageable subproblems.

Consider the initial value problem: ∑ =1 , (x, ; ) + ∑ (x, ; ) + (x, )) = 0 on Ω × [0, ], (0) = 0 .

We can first split it into sequential steps, where each step consists of parallel substeps. At the ( + 1) -th time step, denote the intermediate result computed in the -th branch of the -th sequential , and define +(−1)/ = 1 ∑ =1 +(−1)/ . The function +/ is computed by +

∑ (∑

∑ =1 =1 =1 step by solving:

+/ − +(−1)/ =1 =1 = − ∑ , (x, ; +(−1)/ ) − (x, +1 ; +/ ) − (x, ).

Theorem D.1], meaning that the error ‖ +1 − ( +1 )‖∞ is of the order ( ) . from 1 to and from 1 to to compute +1 .

, are treated explicitly, while the operators are treated implicitly. Functions can be treated explicitly as shown here. Starting with ,0 = , this algorithm iterates through When

, and are Lipschitz continuous, this hybrid splitting scheme is first-order accurate [ 18, [23] R. Glowinski, T.-W. Pan, X.-C. Tai, Some facts about operator-splitting and alternating direction methods, in: Splitting Methods in Communication, Imaging, Science, and Engineering, Springer, 2016, pp. 19–94. [24] T. Lu, P. Neittaanmaki, X.-C. Tai, A parallel splitting-up method for partial diferential equations and its applications to navier-stokes equations, ESAIM: Mathematical Modelling and Numerical Analysis 26 (1992) 673–708. [25] X. Tai, L. Li, E. Bae, The potts model with diferent piecewise constant representations and fast algorithms: a survey, Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging: Mathematical Imaging and Vision (2021) 1–41. [26] X.-C. Tai, H. Liu, R. H. Chan, L. Li, A mathematical explanation of unet, Mathematical Foundations of Computing (2024) 0–0. [27] T. F. Chan, T. P. Mathew, Domain decomposition algorithms, Acta numerica 3 (1994) 61–143. [28] J. Xu, Iterative methods by space decomposition and subspace correction, SIAM review 34 (1992) 581–613. [29] X.-C. Tai, Rate of convergence for some constraint decomposition methods for nonlinear variational inequalities, Numerische Mathematik 93 (2003) 755–786. [30] X.-C. Tai, J. Xu, Global and uniform convergence of subspace correction methods for some convex optimization problems, Mathematics of Computation 71 (2002) 105–124.

[1]

Ronneberger ,

Fischer ,

Brox , U-net: Convolutional networks for biomedical image segmentation , in: International Conference on Medical image computing and computer-assisted intervention , Springer, 2015 , pp. 234 - 241 .

[2]

Zhou , M. M. R. Siddiquee , N.

Tajbakhsh , J.

Liang , Unet++: Redesigning skip connections to exploit multiscale features in image segmentation , IEEE transactions on medical imaging 39 ( 2019 ) 1856 - 1867 .

[3]

L.-C.

Chen , G. Papandreou, I. Kokkinos,

Murphy ,

A. L.

Yuille , Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs , IEEE transactions on pattern analysis and machine intelligence 40 ( 2017 ) 834 - 848 .

[4]

Zhang , W. Zuo,

Chen ,

Meng , L. Zhang, Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising , in: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2017 , pp. 5743 - 5752 .

[5]

Wu ,

Huang ,

Jia ,

Li ,

Chan ,

Zeng ,

S. K.

Zhou , Medical image reconstruction with multi-level deep learning denoiser and tight frame regularization , Applied Mathematics and Computation 477 ( 2024 ) 128795 .

[6]

Graves , A.-r. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks , in: 2013 IEEE international conference on acoustics, speech and signal processing , Ieee, 2013 , pp. 6645 - 6649 .

[7]

A. R.

Barron , Universal approximation bounds for superpositions of a sigmoidal function , IEEE Transactions on Information Theory 39 ( 1993 ) 930 - 945 .

[8]

Song ,

Liu ,

Fan , D.-X. Zhou , Approximation of smooth functionals using deep relu networks , Neural Networks 166 ( 2023 ) 424 - 436 .

[9]

Liu ,

Yang ,

Chen ,

Zhao ,

Liao , Deep nonparametric estimation of operators between infinite dimensional spaces , Journal of Machine Learning Research 25 ( 2024 ) 1 - 67 .

[10] W. E , A Proposal on Machine Learning via Dynamical Systems, Communications in Mathematics and Statistics 5 ( 2017 ) 1 - 11 . doi: 10 .1007/s40304- 017- 0103- z.

[11]

Benning , E. Celledoni,

Ehrhardt ,

Owren ,

Schhönlieb , Deep learning as optimal control problems: models and numerical methods , Journal of Computational Dynamics ( 2019 ).

[12]

Gregor , Y. LeCun, Learning fast approximations of sparse coding , in: Proceedings of the 27th international conference on international conference on machine learning , 2010 , pp. 399 - 406 .

[13]

Yang ,

Sun ,

Li ,

Xu , Admm-csnet: A deep learning approach for image compressive sensing , IEEE transactions on pattern analysis and machine intelligence 42 ( 2018 ) 521 - 538 .

[14]

Ruthotto , E. Haber, Deep neural networks motivated by partial diferential equations , Journal of Mathematical Imaging and Vision 62 ( 2020 ) 352 - 364 .

[15]

Haber , L. Ruthotto, Stable architectures for deep neural networks , Inverse Problems 34 ( 2018 ) 1 - 23 . doi: 10 .1088/ 1361 - 6420/aa9a90. arXiv: 1705 . 03341 .

[16]

Zang ,

Bao ,

Ye ,

Zhou , Weak adversarial networks for high-dimensional partial diferential equations , Journal of Computational Physics 411 ( 2020 ) 109409 .

[17]

Bao ,

Wang ,

Zou , Wanco: Weak adversarial networks for constrained optimization problems , arXiv preprint arXiv: 2407 .03647 ( 2024 ).

[18] X.-C. Tai , H. Liu, R. Chan , Pottsmgnet: A mathematical explanation of encoder-decoder based neural networks , SIAM Journal on Imaging Sciences 17 ( 2024 ) 540 - 594 .

[19]

Liu ,

X.-C.

Tai ,

Chan , Connections between operator-splitting methods and deep neural networks with applications in image segmentation , Ann. Appl. Math 39 ( 2023 ) 406 - 428 .

[20]

Liu , J. Liu,

Chan ,

X.-C.

Tai , Double-well net for image segmentation , arXiv preprint arXiv:2401.00456 ( 2023 ).

[21]

He , J. Xu, MgNet: A unified framework of multigrid and convolutional neural network , Science China Mathematics 62 ( 2019 ) 1331 - 1354 . doi: 10 .1007/s11425- 019- 9547- 2. arXiv: 1901 .10415.

[22]

Glowinski , Finite element methods for incompressible viscous flow , Handbook of numerical analysis 9 ( 2003 ) 3 - 1176 .