<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Neural Networks⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xue-Cheng Tai</string-name>
          <email>xtai@norceresearch.no</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hao Liu</string-name>
          <email>haoliu@hkbu.edu.hk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raymond H. Chan</string-name>
          <email>raymond.chan@ln.edu.hk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lingfeng Li</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics, Hong Kong Baptist University</institution>
          ,
          <addr-line>Kowloon Tong, Kowloon</addr-line>
          ,
          <country>Hong Kong SAR</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Hong Kong Centre for Cerebro-Cardiovascular Health Engineering</institution>
          ,
          <country>Hong Kong SAR</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Lingnan University</institution>
          ,
          <addr-line>Tuen Mun</addr-line>
          ,
          <country>Hong Kong SAR</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Norwegian Research Centre</institution>
          ,
          <addr-line>Nygårdsgaten 112, 5008 Bergen</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this work, we propose a general framework for designing neural network architectures inspired by dynamic diferential equations, utilizing the operator-splitting technique. The central idea is to treat neural network design as a discretizations of a continuous-time optimal control problem, where the underlying dynamics are governed by diferential equations serving as constraints which is then unrolled as our network. These dynamics are discretized through operator-splitting schemes, which allow complex evolution equations to be decomposed into simpler substeps. Each step in the splitting scheme is then unrolled and interpreted as a layer in a neural network, with certain control variables modeled as learnable parameters. This formulation provides a principled way to incorporate prior knowledge about dynamics and structure into the network design. Using our theory, we give a rigorous mathematical explanation of the well-known UNet and show that it is a discretizations of a simple diferential equation. By adding regularization to UNet, we can derive the PottsMGNet also through our proposed framework.</p>
      </abstract>
      <kwd-group>
        <kwd>deep neural network</kwd>
        <kwd>control problem</kwd>
        <kwd>operator splitting</kwd>
        <kwd>UNet</kwd>
        <kwd>image segmentation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>(L. Li)
https://raymondhonfu.github.io/ (R. H. Chan)
https://www.norceresearch.no/personer/xue-cheng-tai (X. Tai); https://www.math.hkbu.edu.hk/~haoliu/ (H. Liu);</p>
      <p>CEUR</p>
      <p>ceur-ws.org
image segmentation across noise levels via a unified framework grounded in multigrid and control
perspectives. Notably, their analysis revealed that many encoder-decoder networks implicitly implement
operator-splitting algorithms for control problems. Further advancing this interplay, Liu et al. [20]
proposed a double-well network that learns region force terms in the Potts model through neural
representations. We also want to mention that the work [21] has provided another way to interpret
neural networks through multigrid techniques.</p>
      <p>In this work, we aim to present a general framework for designing neural networks based on dynamic
diferential equations using operator-splitting technique. The basic idea is discretizing a continuous
control problem using the dynamic system as a constraint which is then approximated by some
operatorsplitting schemes. We unroll this splitting scheme as a neural network with some control variables
parameterized as learnable modules. We also illustrate this approach through examples of the UNet
and the PottsMGNet.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The Central Ideas</title>
      <p>A central insight of our work is that deep neural networks (DNN) can be naturally derived as
discretizations of continuous control problems governed by partial diferential equations. This perspective
provides a unifying framework to design and interpret network architectures through dynamical
systems and optimal control.</p>
      <p>
        Consider a continuous control problem where the state evolution is described by a PDE of the form:
   =  (, )
for (x, ) ∈ Ω × [0,  ],
(0) =  0 in Ω,
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
where Ω is the spatial computational domain,  is a diferential operator parameterized by the control
variable  which could depends on both x,  or one of them. The function  0 is the initial condition. By
properly discretizing this PDE in time by operator-splitting methods and unrolling the dynamics over
discrete steps, we can derive a numerical scheme for solving (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ). It is shown that this scheme has the
same architecture of a network, for which the control variables plays the role of network learnable
parameters. Based on this scheme, one optimizes control variables to minimize a given cost functional
so that ( )
      </p>
      <p>matches the target state. The optimization procedure is the same as network training.</p>
      <p>This is a general framework that applies to all networks. From another perspective, it also implies
that for any given network, we can find a corresponding control problem so that this network is a
numerical scheme (with proper discretization and unrolling) solving this problem. This framework
ofers several advantages:
• Theoretical Analysis: Tools from PDE theory (e.g., stability, convergence, and regularity
analysis) can be applied to study the behavior of DNNs.
• Architecture Design: New network architectures can be derived by choosing appropriate PDE
models and discretization schemes. It enables to equip the designed neural networks with diferent
kinds of physical properties which are often desired, but have not been able to achieve with
existing networks.
• Scalability, Interoperability and Explainability: The continuous viewpoint provides insights
into the role of depth, initialization, and parameterization in DNNs.</p>
      <p>In the following sections, we give details of this idea and demonstrate how to use operator-splitting
methods and unrolling to discretize the control problems. Applications of this framework on UNet [1]
and PottsMGNet [18] will be discussed.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Discretization by Operator-Splitting Methods</title>
      <p>Assume that we have decomposed the diferential operator  in the following form:

=1
   =
∑   (, ), (0) =  0
where each   (, ) is possibly nonlinear and acts as a distinct operator. Operator-splitting methods
approximate the full solution by evolving  through each operator sequentially or in parallel over a time
step  . For example, the sequential splitting [22, 23] evolves the solution by applying the sub-solvers
sequentially:
where  
, 
()̂ = (  +  ) denotes the solution operator of
 +1 =  ,  ∘ ⋯ ∘  1,</p>
      <p>(  ),
   =   (, ), (  ) = . ̂
 +1 = 1

∑ 
 =1

, 
(  ).</p>
      <sec id="sec-3-1">
        <title>The solution operator</title>
        <p>, 
splitting scheme [24] solves each sub-problem in parallel and combines them together
()̂ can either be solved explicitly or approximated numerically. The parallel
We may also apply the sequential and parallel schemes together to form hybrid splitting schemes as
explained in Appendix A, see also [18, Appendix D] for some more details.</p>
        <p>After splitting, unrolling methods are used to construct neural networks. Unrolling maps the splitting
scheme to a neural network architecture, where each operator   becomes a network layer. Specifically,
we have the following construction:
1. Layer structure: Each time step   →  +1 is unrolled into  sub-layers, one per operator   . If  
is known (e.g., a Laplacian), we implement   as a fixed layer. If   contains unknown learnable
parameters, we represent   as a trainable neural network block.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2. Parameterization: Replace unknown   with learnable modules</title>
        <p>(, ) with parameters   ().</p>
        <p>The operator   becomes a learnable layer  
3. Unrolled network: By replacing   with  
,
model  Θ with learnable parameters Θ = {  (  )}</p>
        <p>=1,=1 .
,  ,</p>
        <p>.</p>
        <p>,  ,  in the splitting scheme, we get a neural network</p>
        <p>This unrolled network can be trained by minimizing the discrepancy between predictions from a
given initial condition  0 and the corresponding ground-truth solutions   (e.g., MSE):
 ̂
=1
ℒ (Θ) =∑ ‖ Θ( 0) −   ‖2 .</p>
        <p>
          where { 0, 

 =̂ 1 is a set of training samples. Let us emphasis that we can use other loss functions
}
instead of MSE, such as the cross-netropy loss. Diferent deep neural networks in the literature are just
diferent approximations of some specially chosen PDEs or ODEs. In the following section, we will
show that the well-known UNet [1] is just an unrolling of a numerical approximation of a special PDE,
see (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ).
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. UNet as Multigrid Operator Splitting</title>
      <p>The well-known deep neural network UNet [1] is designed to segment images into foreground and
background represented by a binary image. By applying the framework in Section 2-3, we show that it
is just a numerical approximation of a PDE.</p>
      <sec id="sec-4-1">
        <title>4.1. The control problem</title>
        <p>Given an input image  defined on the image domain Ω, we consider the following initial value problem
{ 
( x, 0) =  ( ( x)),x ∈ Ω,
( x,) =  ( x, ) ∗ ( x, ) + () −  ln ( x,)</p>
        <p>
          , (x, ) ∈ Ω × (0,  ],
1−( x,)
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
where ∗ denotes convolution,  ( x, ), () are control variables which will be learned in the training
process. The convolution kernel  ( x, ) ∶  × (0,  ] → ℝ is supported on some spatial domain  ⊂ ℝ 2.
In practice,  can be diferent from Ω, and is usually a small region. In this paper, for simplicity, we take
 = Ω = [−1, 1] 2. When computing convolution, we extend functions to ℝ2 by padding convolution
kernels  ( x, ) by 0 and solution function ( x, ) periodically.
        </p>
        <p>
          Equation (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) evolves any input image  into a probability in (, ) ∈ [0, 1] , see [18, Appendix A]
and [25] for some more details with the derivation of this equation. Above,  ( ) is some operation to
generate initial condition from  . Due to the appearance of the term ln 1− , the solution of the above
equation is forced to be in (
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ). For numerical consideration and to make the connection between
operator-splitting methods and neural networks clearer, we introduce a constraint (, ) ≥ 0 to the
control problem. Due to the property of the term ln 1− , the introduced constraint does not change the
solution. Next, we incorporate the constraint into the equation by introducing an indicator function

{ 
( x, 0) =  ( ( x)),x ∈ Ω,
−  ( x, ) ∗  − () +  ln 
1− + ℐ Σ() ∋ 0, (x, ) ∈ Ω × (0,  ],
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
where
        </p>
        <p>
          Σ = { ∶ ( x, ) ≥ 0 for (x, ) ∈ Ω × (0,  ]},
ℐΣ is the indicator function of Σ and ℐ Σ denotes the subdiferential of ℐΣ. By solving (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) for any input
image  , the initial value ( x, 0) will evolve to ( x,  ) , which is a probability function. By choosing
 close to 1, we can force this probability function to be close to a binary function. For simplify of
explanation, we will take  = 1 and this is also the value the original UNet is using.
        </p>
        <p>
          To solve (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) numerically, [26] decomposed control variables (learnable variables) { ( x, ), ()} in (
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
using the multigrid idea. Here, we introduce a simplified version of their algorithm that can recover a
simple UNet-like structure. The discussion can be generalized to recover the original UNet structure
from (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ).
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Decomposition of control variables</title>
        <p>In traditional multigrid methods, a popular framework is the ”fine grid → coarse grid → fine grid”
strategy [27, 28]. Such a form of V-cycle multigrid method can be interpreted as space decomposition
and subspace correction [29, 30].</p>
        <p>
          Motivated by the fact that images can be viewed as piecewise constant functions in practice, in this
paper, we consider piecewise constant approximation of diferent resolutions. Let  &gt; 0 be some integer.
For the finest resolution, we consider discretizing  into (2 )2 small squares, each of which is of size
(2−+1 ) × (2−+1 ). They are called pixels in image processing and rectangular elements in finite elements
methods. In real applications, the finest mesh is the grid the input image is given. Denote ℎ1 = 2−+1
and this set of pixles by  1. Starting with  1, we can define a sequence of coarse pixels {  }+=11 .
Specifically, let ℎ = 2−1 ℎ1 = 2− . The set   consists of coarse pixels with size ℎ × ℎ that form a
partition of  . For each   , we can define a set of piecewise constant basis functions. For the  -th pixel
in   , denoted by   , we define   (x)such that it equals to 1 if x ∈   and equals to 0 otherwise. We
denote the space spanned by this set of functions by   . We have that
 +1 ⊂   ⊂ ⋯ ⊂  1.
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
Note that each function in   has 22(+1−) coeficients and dim(  ) = 22(+1−) . Traditional multigrid
methods solve the decomposed subproblems by simple Gauss-Seidel or Jacobi iterations. Here, the
multigrids problems are used in a diferent way.
        </p>
        <p>
          We will decompose { ( x, ), ()} into a sum of variables with diferent scales over the multigrids.
Then, we use a hybrid splitting method to solve (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) so that all decomposed variables are distributed into
several subproblems, which are solved sequentially or in parallel. Within one iteration of the splitting
method, all decomposed variables are gone through. The general splitting idea is to split the operators
for (
          <xref ref-type="bibr" rid="ref8">8</xref>
          ) with  = 1 .
based on a V-cycle according to the grid level, cf. Figure 1 for an illustration. We decompose all terms
in the right-hand side of (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) via the following steps:
1. According to the idea of a V-cycle, we decompose  ( x, ) and () as
        </p>
        <p>( x, ) = ( x, ) + ( ̃x, ), () = () + (̃).</p>
        <p>These variables will be further decomposed next. Above, , 
are sums of control variables in the
left branch of the V-cycle, and , ̃ ̃are sums of the control variables in the right branch, see an
illustration in Figure 1. We also decompose the nonlinear operator as follows:
− ln</p>
        <p>Here, () contains nonlinear operations in the left branch and (̃) contains nonlinear operations
in the right branch. In particular, we put − ln 
in (̃) only, i.e., () = ℐ
Σ() and (̃) =
1−
UNet, in which the operation − ln 
ℐ Σ() − ln  . Later, we will show that our operator splitting method recovers a simplified
corresponds to the sigmoid layer at the end of UNet.
2. We further decompose the operators into components at diferent grids as:
( x, ) = ∑   (x, ), () =
∑   (), () =</p>
        <p>
          ∑   (),

=1
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
( ̃x, ) = ∑  ̃(x, ) +  ∗(x, ), (̃) =
∑  ̃() +  ∗(), (̃) =
∑  ̃() +  ∗(),
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
applied to the intermediate solution on grid level  . Operator  ∗() = − ln 
where   ,   ,  ̃ ,  ̃contain control variables at grid level  ,  ∗,  ∗ are control variables that are
applied to the output of the V-cycle at the finest mesh. Operators   () =  ̃() = ℐ Σ() are
is applied to the
output of the V-cycle at the finest mesh.
3. At grid level  in the left branch,   is defined on  , which contains 22(+1−) coeficients to be
learned. This leads to a high complexity when  is small. In order to decrease the complexity,
we decompose   into a sum of   which is nonzero only on small patch   of size (ℎ  ) × (ℎ  ),
 
denoted by {  } =1 , where   is the total number of patches  
 . Normally, we take  = 3 or 5 and
Thus we have
all   shall cover  . Patches with  = 3,  = 1
        </p>
        <p>is illustrated in Figure 1. Each   equals to   over

 if the support sets   do not overlap. We also decompose   and   into a sum of   components.</p>
        <p>(x, ) = ∑   (x, ),   = ∑ 

 (),   = ∑</p>
        <p>().


=1


=1


=1
we have
tered at (0, 0)is a shifted version of  
Note that   ’s have diferent supports, making specifying the support for each function
complicated. In order to simplify the settings, we shift 
for each   , let    be a shifting kernel so that   = 
 so that they are centered at (0, 0). Specifically,
 ∗  ̂ where  ̂ supported on a square
cen
 . According to the shift-invariant property of convolution,

  (x, ) ∗ ( x, ) = ∑   (x, ) ∗ ( x, ) = ∑(</p>
        <p>(x, ) ∗  ̂ (x, )) ∗ ( x, )
 
 × 

Thus learning a kernel of with 22(+1−) coeficients is converted to learning   kernels  ̂ (x, ) with
coeficients arond the origin. In implementation,   corresponds to the number of channels

in a CNN and {  }=1 corresponds to CNN convolution kernels of size  ×  . In our experiments,
for simplicity, we omit the shifting operator 
  and replace (

 (x, ) ∗ ( x, )) by ( x, )) and it
gives good results. This is also what is done in the original UNet. The same decomposition is
done for  ̃,  ̃and  ̃.
4. For each grid level  and each channel  , we further decompose</p>
        <p>̂ (x, ) = ∑  , (x, ),
and  , (x, ) has support around the origin with  × 
position is to increase the number of parameters for the training variables. In our algorithm,
each channel will compute an intermediate result. In the computation,  , is used to convolve
with the intermediate solution in the  -th channel of grid level  − 1 . The same decomposition is
coeficients. The purpose of this
decomBefore the decomposition, the control variables are  ( x, ), () . After these decompositions, we
see that the control variables are decomposed as in the following and the control variables are now</p>
        <p>, (x, ),   (), ̃, (x, ),  ̃(),</p>
        <p>∗, (x, ),  ∗(),:
( x, ) = ∑
∑</p>
        <p>∑  , (x, ),
( ̃x, ) = ∑
∑</p>
        <p>∑  ̃, (x, ) + ∑  ∗(x, ),
    −1
=1 =1 =1



=1 =1
and the operators (), (̃) are decomposed as:
( x, ) = ∑
∑ 

 (x, ),
(̃x, ) = ∑</p>
        <p>∑  ̃(x, ) +  ̃∗(x, ),
() =</p>
        <p>∑</p>
        <p>∑ 
=1 =1

 (), (̃) =
with   () = 

̃() = ℐ Σ(),  ∗() = − ln

̃() +  ∗()</p>
        <p>.</p>
        <p>
          The original PDE (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) is turned to
        </p>
        <p>{ 
( x, 0) =  ( ), x ∈ Ω.</p>
        <p>=  ∗  +
 ∗̃ +  +</p>
        <p>
          +̃ () + (̃), ( x, ) ∈ Ω × [0,  ],
To solve (
          <xref ref-type="bibr" rid="ref15">15</xref>
          ), we use a hybrid splitting method shown in Appendix A. Divide the time interval [0,  ]
into  subintervals with time step  =  /
        </p>
        <p>. Denote the computed solution at time   =  by   . The
resulting algorithm that updates   to  +1 is summarized in Algorithm 1, where  and  represent
the downsampling and upsampling operator respectively. For simplicity, variable dependencies on x
are omitted.</p>
        <p>where ReLU() = max{, 0̄} is the rectified linear unit.</p>
        <p>
          Problem (
          <xref ref-type="bibr" rid="ref19">19</xref>
          ) can be written as
        </p>
        <p>= max{, 0̄} = ReLU()̄,
 −  ∗
=  ∗̂  ∗ +  −̂ ln</p>
        <p>Data: The initial condition solution   .</p>
        <p>Compute   ∈   by solving




end for
Compute   = 1
∑</p>
        <p>=1  .</p>
        <p>
          end for
4.3. On the solution to (
          <xref ref-type="bibr" rid="ref16">16</xref>
          ), (
          <xref ref-type="bibr" rid="ref18">18</xref>
          ) and (
          <xref ref-type="bibr" rid="ref19">19</xref>
          )
Observe that (
          <xref ref-type="bibr" rid="ref16">16</xref>
          ) and (
          <xref ref-type="bibr" rid="ref18">18</xref>
          ) are in the form of
 −  ∗

 


=1
 +1 −  1 −  ∗(  ) ∗  1 −  ∗(  ) −  ∗( +1 ) ∋ 0.
        </p>
        <p>− ∑  ̂ ∗  ∗ −  +̂ ℐ Σ() ∋ 0,
where  is some constant,  ∗ = 1 ∑</p>
        <p>
          =1  ∗ is known,  ̂ is a convolution operator,  is the number of
input channel,  ̂ is a bias function. The solution to (
          <xref ref-type="bibr" rid="ref20">20</xref>
          ) is computed by a two-sub-step splitting method:
 
{ − ̄ + ℐ Σ() ∋ 0.
        </p>
        <p>
          =̄  ∗ +   (∑=1  ̂ ∗  ∗ +  ̂ ),
sub-step, it is, in fact, a projection. Its closed-form solution is given as
In (
          <xref ref-type="bibr" rid="ref21">21</xref>
          ), there is no dificulty in solving for  ̄ in the first sub-step as it is an explicit step. Let us emphasis
that the term
∑

=1  ̂ ∗  ∗ +  ̂ in this step is exactly the Pytorch function Conv2d. For u in the second
The first sub-step is an explicit step using Pytorch function Conv2d. We solve the second sub-step
approximately by a fixed point iteration. Initialize  0 =  .̄ Given   , we update  +1 by solving
for which we have the closed-form solution
        </p>
        <p>1
where Sig() = 1+ − is the sigmoid function. By repeating (26) so that  +1 converges to some function
 ∗, we set  =  ∗. In particular, since  0 =  ,̄ the updating formula (26) always gives  1 = 0.5. If we
only consider a two-step fixed point iteration, we get
 = Sig (−
0.5 −  ̄</p>
        <p>) = Sig ( −̄ 0.5 ) .</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.4. Algorithm 1 recover the UNet</title>
        <p>We first show that a building block of Algorithm 1 is equivalent to a layer of a simplified UNet. Each
layer of UNet is a convolution layer activated by ReLU:</p>
        <p>{ − ̄ = − ln  .</p>
        <p>1−</p>
        <p>=̄  ∗ +   (∑=1  ̂ ∗  ∗ +  ̂ ),
  −  ̄</p>
        <p>= − ln
 +1 = Sig (−</p>
        <p>+1
1 −  +1

 −  ̄</p>
        <p>
          ) ,
{
 =̄
 = ReLU( )̄ ,
∑

=1   ∗   ∗ + ,

=1

=1
where   is a convolutional kernel and  is the bias. In Algorithm 1, the building block is (
          <xref ref-type="bibr" rid="ref20">20</xref>
          ) and (23),
which is solved by (
          <xref ref-type="bibr" rid="ref21">21</xref>
          ) and (24). In fact, (28) (or problem (24)) and (
          <xref ref-type="bibr" rid="ref21">21</xref>
          ) have the same form. Specifically,
in the first equation of (
          <xref ref-type="bibr" rid="ref21">21</xref>
          ) we have
 =̄ ∗ +   (∑  ̂ ∗  ∗) +    ̂=
∑ (1/ +    ̂ ) ∗  ∗ +   ,̂
where 1 denotes the identity kernel satisfying 1 ∗  =  for any function  . In (28), set
  = 1/ +    ̂ ,  =   .̂
Following the steps for solving (
          <xref ref-type="bibr" rid="ref16">16</xref>
          ) and (
          <xref ref-type="bibr" rid="ref18">18</xref>
          ) above, we solve (23) as
,

(24)
(25)
(26)
(27)
(28)
(29)
(30)
level  ) in Algorithm 1.
        </p>
        <p>in Algorithm 1.</p>
        <p>
          We have  =̄  ̄ , and  =  . Essentially, Algorithm 1 and UNet are the same. Thus, we have shown that a
simplified UNet structure (with only 1 convolution layer at each data resolution) is equivalent to one
iteration of Algorithm 1. UNet architecture consists of four components: encoder, decoder, bottleneck,
and skip-connections, each of which has a corresponding component in the structure of Algorithm 1:
1. Encoder: Encoder in UNet corresponds to the left branch of the V-cycle in Algorithm 1. The
number of data resolution levels corresponds to the number of grid levels  .
2. Decoder: Decoder in UNet corresponds to the right branch of the V-cycle in Algorithm 1.
3. Bottleneck: Bottleneck in UNet corresponds to the computations at the coarsest grid level (grid
4. Skip-layer connection: Skip-layer connections in UNet correspond to the relaxation steps (
          <xref ref-type="bibr" rid="ref17">17</xref>
          )
Therefore, a one-step operator splitting of the control problem (
          <xref ref-type="bibr" rid="ref15">15</xref>
          ) is exactly equivalent to a simplified
UNet.
        </p>
        <p>
          Algorithm 2: A hybrid splitting method to solve the control problem (
          <xref ref-type="bibr" rid="ref15">15</xref>
          ) for PottsMGNet
Data: The initial condition solution   at time step   .
        </p>
        <p>Compute   ∈   by solving
end for
 +1 −  1̄ −  ∗(  ) ∗  1̄ −  ∗(  ) −  ∗( +1 ) ∋ 0.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Case study: PottsMGNet</title>
      <p>above given split of operators.</p>
      <p>
        The second substep is then defined as
In the previous section, we presented how to use the framework introduced in Section 2-3 to show that
the operator splitting scheme of (
        <xref ref-type="bibr" rid="ref15">15</xref>
        ) is equivalent to a UNet. In this section, we show that PottsMGNet
proposed in [18] is another instance of this framework. Specifically, we will consider a modified control
problem of (
        <xref ref-type="bibr" rid="ref15">15</xref>
        ) and a modified splitting scheme of Algorithm
1.
      </p>
      <sec id="sec-5-1">
        <title>5.1. Constructing a PottsMGNet</title>
        <p>
          We consider the control problem (
          <xref ref-type="bibr" rid="ref15">15</xref>
          ) with decomposition (
          <xref ref-type="bibr" rid="ref5">5</xref>
          )-(
          <xref ref-type="bibr" rid="ref10">10</xref>
          ). Here, the decomposition of the
nonlinear operator  and  ̃as in (
          <xref ref-type="bibr" rid="ref14">14</xref>
          ) are defined as

 () = −
(2−1 )−1

ln
        </p>
        <p>,  ∗() = − 1 ln</p>
        <p>
          variance  2. To solve this new system, we consider a multi-step operator splitting algorithm (Algorithm
2) using the hybrid splitting method in Appendix A for approximating the solution of (
          <xref ref-type="bibr" rid="ref15">15</xref>
          ) with the
When solving the subproblems (31) and (32), we adopt a similar sequential splitting method as (
          <xref ref-type="bibr" rid="ref21">21</xref>
          ).
 = (ℐ −   )
−1()̄,
which can be approximately solved by a fixed point iteration as for the second substep in ( 24). By
unrolling Algorithm 2, we can get a simplified PottsMGNet [ 18].
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Comparison with UNet</title>
        <p>Compared with the UNet structure, the PottsMGNet starts with a control problem with a diferent
nonlinear term  and  ,̃which results in a diferent activation function at each layer of the resulted neural
network. The position of the relaxation step is also diferent, which leads to a diferent skip-connection
structure. Additionally, the UNet is a one-step algorithm while the PottsMGNet is a multi-step algorithm.
Similar to UNet, we can also extend Algorithm 2 to a multi-channel case by further split the control
variables.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this work, we present a framework to design novel neural networks using operator splitting of
some control problems. Networks designed in this way are usually robust to disturbances in the input,
because concerned control problems usually include some regularization terms. We also presented how
to derive UNet and PottsMGNet from this framework. In the future, we would apply this framework to
other types of networks like graph neural networks.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used DeepSeek in order to: Grammar and spelling
check. After using these tools, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-8">
      <title>Appendix</title>
      <p>A. Hybrid Splitting Schemes for Initial Value Problems and Neural</p>
    </sec>
    <sec id="sec-9">
      <title>Network Design</title>
      <p>We discuss a hybrid splitting scheme proposed in [18] as a powerful technique for solving initial value
problems, which can be efectively leveraged to design and understand the architecture of deep neural
networks. A hybrid splitting scheme combines the principles of parallel and sequential splitting schemes
to decompose complex problems into more manageable subproblems.</p>
      <p>Consider the initial value problem:


∑ 
=1

, (x, ; ) +
∑   (x, ; ) +
 (x, )) = 0 on Ω × [0,  ], (0) =  0

.</p>
      <p>We can first split it into  sequential steps, where each step consists of   parallel substeps. At the
( + 1) -th time step, denote the intermediate result computed in the  -th branch of the  -th sequential
, and define  +(−1)/
= 1




∑
=1 

+(−1)/
. The function  
+/
is computed by
  +</p>
      <p>∑ (∑</p>
      <p>∑ 
=1 =1 =1
step by  
solving:</p>
      <p>+/

−  +(−1)/
  


=1


=1
= − ∑ 

, (x,  
;  
+(−1)/
) −   (x,  +1 ;  
+/
) −  
 (x,   ).</p>
      <p>Theorem D.1], meaning that the error ‖ +1 − ( +1 )‖∞ is of the order ( ) .
from 1 to  and  from 1 to   to compute  +1 .</p>
      <p>, are treated explicitly, while the operators   are treated implicitly. Functions
 can be treated explicitly as shown here. Starting with  ,0 =   , this algorithm iterates through 
When</p>
      <p>, and   are Lipschitz continuous, this hybrid splitting scheme is first-order accurate [ 18,
[23] R. Glowinski, T.-W. Pan, X.-C. Tai, Some facts about operator-splitting and alternating direction
methods, in: Splitting Methods in Communication, Imaging, Science, and Engineering, Springer,
2016, pp. 19–94.
[24] T. Lu, P. Neittaanmaki, X.-C. Tai, A parallel splitting-up method for partial diferential equations
and its applications to navier-stokes equations, ESAIM: Mathematical Modelling and Numerical
Analysis 26 (1992) 673–708.
[25] X. Tai, L. Li, E. Bae, The potts model with diferent piecewise constant representations and fast
algorithms: a survey, Handbook of Mathematical Models and Algorithms in Computer Vision and
Imaging: Mathematical Imaging and Vision (2021) 1–41.
[26] X.-C. Tai, H. Liu, R. H. Chan, L. Li, A mathematical explanation of unet, Mathematical Foundations
of Computing (2024) 0–0.
[27] T. F. Chan, T. P. Mathew, Domain decomposition algorithms, Acta numerica 3 (1994) 61–143.
[28] J. Xu, Iterative methods by space decomposition and subspace correction, SIAM review 34 (1992)
581–613.
[29] X.-C. Tai, Rate of convergence for some constraint decomposition methods for nonlinear variational
inequalities, Numerische Mathematik 93 (2003) 755–786.
[30] X.-C. Tai, J. Xu, Global and uniform convergence of subspace correction methods for some convex
optimization problems, Mathematics of Computation 71 (2002) 105–124.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.</given-names>
            <surname>Ronneberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          , U-net:
          <article-title>Convolutional networks for biomedical image segmentation</article-title>
          , in: International Conference on
          <article-title>Medical image computing and computer-assisted intervention</article-title>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. R. Siddiquee</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Tajbakhsh</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Liang</surname>
          </string-name>
          , Unet++:
          <article-title>Redesigning skip connections to exploit multiscale features in image segmentation</article-title>
          ,
          <source>IEEE transactions on medical imaging 39</source>
          (
          <year>2019</year>
          )
          <fpage>1856</fpage>
          -
          <lpage>1867</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.-C.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Papandreou, I. Kokkinos,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Yuille</surname>
          </string-name>
          ,
          <article-title>Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs</article-title>
          ,
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>40</volume>
          (
          <year>2017</year>
          )
          <fpage>834</fpage>
          -
          <lpage>848</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Zuo,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Meng</surname>
          </string-name>
          , L. Zhang,
          <article-title>Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising</article-title>
          ,
          <source>in: IEEE Conference on Computer Vision</source>
          and Pattern Recognition, IEEE,
          <year>2017</year>
          , pp.
          <fpage>5743</fpage>
          -
          <lpage>5752</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Medical image reconstruction with multi-level deep learning denoiser and tight frame regularization</article-title>
          ,
          <source>Applied Mathematics and Computation</source>
          <volume>477</volume>
          (
          <year>2024</year>
          )
          <fpage>128795</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Graves</surname>
          </string-name>
          , A.-r. Mohamed, G. Hinton,
          <article-title>Speech recognition with deep recurrent neural networks</article-title>
          ,
          <source>in: 2013 IEEE international conference on acoustics, speech and signal processing</source>
          , Ieee,
          <year>2013</year>
          , pp.
          <fpage>6645</fpage>
          -
          <lpage>6649</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Barron</surname>
          </string-name>
          ,
          <article-title>Universal approximation bounds for superpositions of a sigmoidal function</article-title>
          ,
          <source>IEEE Transactions on Information Theory</source>
          <volume>39</volume>
          (
          <year>1993</year>
          )
          <fpage>930</fpage>
          -
          <lpage>945</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.-X. Zhou</surname>
          </string-name>
          ,
          <article-title>Approximation of smooth functionals using deep relu networks</article-title>
          ,
          <source>Neural Networks</source>
          <volume>166</volume>
          (
          <year>2023</year>
          )
          <fpage>424</fpage>
          -
          <lpage>436</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <article-title>Deep nonparametric estimation of operators between infinite dimensional spaces</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>25</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>W. E</surname>
          </string-name>
          , A Proposal on
          <source>Machine Learning via Dynamical Systems, Communications in Mathematics and Statistics</source>
          <volume>5</volume>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          . doi:
          <volume>10</volume>
          .1007/s40304- 017- 0103- z.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Benning</surname>
          </string-name>
          , E. Celledoni,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ehrhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Owren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schhönlieb</surname>
          </string-name>
          ,
          <article-title>Deep learning as optimal control problems: models and numerical methods</article-title>
          ,
          <source>Journal of Computational Dynamics</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Gregor</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <article-title>LeCun, Learning fast approximations of sparse coding</article-title>
          ,
          <source>in: Proceedings of the 27th international conference on international conference on machine learning</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>399</fpage>
          -
          <lpage>406</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Admm-csnet: A deep learning approach for image compressive sensing</article-title>
          ,
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>42</volume>
          (
          <year>2018</year>
          )
          <fpage>521</fpage>
          -
          <lpage>538</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ruthotto</surname>
          </string-name>
          , E. Haber,
          <article-title>Deep neural networks motivated by partial diferential equations</article-title>
          ,
          <source>Journal of Mathematical Imaging and Vision</source>
          <volume>62</volume>
          (
          <year>2020</year>
          )
          <fpage>352</fpage>
          -
          <lpage>364</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>E.</given-names>
            <surname>Haber</surname>
          </string-name>
          , L. Ruthotto,
          <article-title>Stable architectures for deep neural networks</article-title>
          ,
          <source>Inverse Problems</source>
          <volume>34</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          . doi:
          <volume>10</volume>
          .1088/
          <fpage>1361</fpage>
          - 6420/aa9a90. arXiv:
          <volume>1705</volume>
          .
          <fpage>03341</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Weak adversarial networks for high-dimensional partial diferential equations</article-title>
          ,
          <source>Journal of Computational Physics</source>
          <volume>411</volume>
          (
          <year>2020</year>
          )
          <fpage>109409</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zou</surname>
          </string-name>
          , Wanco:
          <article-title>Weak adversarial networks for constrained optimization problems</article-title>
          , arXiv preprint arXiv:
          <volume>2407</volume>
          .03647 (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>X.-C. Tai</surname>
            , H. Liu,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Chan</surname>
          </string-name>
          ,
          <article-title>Pottsmgnet: A mathematical explanation of encoder-decoder based neural networks</article-title>
          ,
          <source>SIAM Journal on Imaging Sciences</source>
          <volume>17</volume>
          (
          <year>2024</year>
          )
          <fpage>540</fpage>
          -
          <lpage>594</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-C.</given-names>
            <surname>Tai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <article-title>Connections between operator-splitting methods and deep neural networks with applications in image segmentation</article-title>
          ,
          <source>Ann. Appl. Math</source>
          <volume>39</volume>
          (
          <year>2023</year>
          )
          <fpage>406</fpage>
          -
          <lpage>428</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-C.</given-names>
            <surname>Tai</surname>
          </string-name>
          ,
          <article-title>Double-well net for image segmentation</article-title>
          ,
          <source>arXiv preprint arXiv:2401.00456</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          , J. Xu,
          <article-title>MgNet: A unified framework of multigrid and convolutional neural network</article-title>
          ,
          <source>Science China Mathematics</source>
          <volume>62</volume>
          (
          <year>2019</year>
          )
          <fpage>1331</fpage>
          -
          <lpage>1354</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11425- 019- 9547- 2. arXiv:
          <year>1901</year>
          .10415.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>R.</given-names>
            <surname>Glowinski</surname>
          </string-name>
          ,
          <article-title>Finite element methods for incompressible viscous flow</article-title>
          ,
          <source>Handbook of numerical analysis 9</source>
          (
          <year>2003</year>
          )
          <fpage>3</fpage>
          -
          <lpage>1176</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>