<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.1007/s10409-021-01148-1</article-id>
      <title-group>
        <article-title>Constraint-Guided PINNs: A Constrained Optimization Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wout Rombouts</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Quinten Van Baelen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Karsmakers</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Flanders Make @ KU Leuven</institution>
          ,
          <addr-line>B-3000 Leuven</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>KU Leuven, Dept. of Computer Science</institution>
          ,
          <addr-line>Kleinhoefstraat 4, B-2440 Geel</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Leuven.AI - KU Leuven Institute for AI</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>1727</fpage>
      <lpage>1738</lpage>
      <abstract>
        <p>Physics-Informed Neural Networks (PINNs) have emerged as a powerful tool for solving Partial Diferential Equations (PDEs) by integrating physical laws into the learning process. However, PINNs often struggle with training instability and the challenge of balancing multiple loss terms, which typically requires extensive hyperparameter tuning. In this paper, we introduce Constraint Guided Physics-Informed Neural Networks (CGPINNs), a novel approach that leverages Constraint Guided Gradient Descent (CGGD) to train PINNs. CGPINN reframes the learning problem as a constrained optimization task, replacing complex hyperparameter balancing with more intuitive, semantically meaningful parameters. We also propose to add two sets of constraints derived from the PDE at the initial and boundary conditions, which prevent the model from converging to trivial solutions when using CGGD. Our experiments on a simulated heat difusion problem demonstrate that CGPINN ofers a more stable and robust training procedure, efectively learning the underlying physics without the need for expensive hyperparameter searches.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Physics-Informed Neural Networks</kwd>
        <kwd>Constraint-Guided Gradient Descent</kwd>
        <kwd>Neuro-Symbolic AI</kwd>
        <kwd>Constrained Optimization</kwd>
        <kwd>Partial Diferential Equations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Neuro-symbolic AI aims to combine deep learning with
knowledge-based systems, bridging the gap between
statistical learning and symbolic reasoning. Physics-Informed
Neural Networks (PINNs) fall under this paradigm, as they
integrate prior knowledge, in the form of physically inspired
diferential equations, into the learning process. In PINNs,
this physical knowledge is encoded through Partial
Diferential Equations (PDEs) and is embedded into the neural
network training, enforcing physical laws or prior
knowledge during the learning process into the model weights.
This enables PINNs to learn solutions that are consistent not
only with observational data but also with the governing
physics of the problem.</p>
      <p>PDEs play a fundamental role in modeling a wide range of
physical phenomena across science and engineering,
including fields like fluid dynamics, heat transfer and others [ 1, 2].
Traditional numerical methods, such as the finite element
method, have long been the standard for solving PDEs [3].
While these methods are robust and well-established, they
often face challenges when extended to high-dimensional
problems and can sufer from high computational cost and
very slow inference [4]. In contrast, PINNs represent a
novel approach to solving PDEs by leveraging the
expressive power and fast inference of deep neural networks [5].</p>
      <p>Despite their potential, they are not without limitations.
It is generally known that PINNs can be hard to train. A
common challenge is the balancing of diferent loss
components, such as the data-fitting term and the PDE residual
term, which often requires careful and costly
hyperparameter tuning, as an imbalance in these terms can lead to slow
convergence or a failure to learn a good solution [6].</p>
      <p>To address these challenges, we introduce Constraint</p>
      <p>Guided Physics-Informed Neural Networks (CGPINNs),
which reformulates the training of PINNs as a constrained
optimization problem. To solve this, we propose a novel
training methodology based on Constraint Guided Gradient
Descent (CGGD) [7]. CGGD is a learning framework that
enables the training of deep learning models by
minimizing an objective function while explicitly satisfying a set of
constraints, including those involving continuous variables.
This approach allows constraints to be enforced directly
during training, thereby eliminating the need to manually
tune weighting hyperparameters for balancing multiple loss
terms.</p>
      <p>This article is organised as follows. In Section 2, we
discuss related work on addressing PINN training dificulties,
including adaptive weighting strategies and architectural
modifications. Section 3 details our methodology, starting
with an introduction to the CGGD algorithm, followed by
the new CGPINN method, its learning objective and the
inclusion of initial and boundary condition constraints.
Section 4 outlines heat difusion experiments, covering the
setup and physical configurations, data generation,
evaluation metrics, network architecture, and the training process.
Section 5 presents and discusses the results, comparing
CGPINN’s performance against a vanilla PINN baseline. Finally,
section 6 provides the conclusion and outlines future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>The challenge of efectively training PINNs is widely
recognized in the scientific machine learning community
[8, 9, 10]. One of the core dificulties originates from the
multi-objective nature of the standard training process,
which relies on minimizing a composite loss function. This
loss typically combines a data-fitting term with a
physicsbased residual term that enforces a PDE. These diferent
objectives often result in a dificult training process, as the
corresponding loss terms can have vastly diferent scales
and gradient magnitudes. This “gradient pathology” [11]
can cause the optimization process to be dominated by one
objective, leading to slow convergence or a failure to find a
physically meaningful solution.</p>
      <p>Much of the existing research on multi-objective
optimization, though not specific to PINNs, has focused on
addressing this issue through adaptive weighting strategies
[12]. These methods aim to dynamically adjust the
relative importance of each loss component during training,
in an efort to achieve a more balanced and efective
optimization process. Examples include approaches such as
GradNorm [13] that adjust weights based on the norm of
the gradients of each loss term, attempting to ensure that all
objectives contribute meaningfully to the weight updates.
Other techniques, such as Learning Rate Annealing [11],
assign diferent learning rates to diferent parts of the loss
function and anneal them over time. While often efective,
these methods can introduce new hyperparameters that
require careful, and often expensive, tuning.</p>
      <p>Another line of research has explored architectural
modiifcations to improve PINN performance. Some studies have
demonstrated that using specialized architectures or
adaptive elements can enhance the network’s ability to
approximate complex solutions [14, 11]. Others have incorporated
techniques like Fourier feature mappings [15] to help the
network learn high-frequency components that are
common in physical phenomena but are notoriously dificult
for standard MLPs to capture. While beneficial, these
architectural changes do not fundamentally alter the underlying
training challenge of balancing competing loss objectives.</p>
      <p>Our work takes a diferent path by reframing the PINN
training problem from a multi-objective optimization task to
a constrained optimization task. Instead of balancing
competing objectives, we treat the physical laws as constraints
that the solution must satisfy.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>At the core of our proposed methodology for training
CGPINNs lies the CGGD algorithm [7]. We begin with a brief
introduction to CGGD before detailing the CGPINN
framework.</p>
      <sec id="sec-3-1">
        <title>3.1. Constraint Guided Gradient Descent (CGGD)</title>
        <p>CGGD is an optimization framework that enhances
traditional gradient descent by incorporating hard inequality
constraints into the training process. Unlike conventional
approaches that rely solely on minimizing data-driven or
multi-objective loss functions, CGGD introduces a
mechanism to enforce a set of constraints throughout the training.</p>
        <p>At each iteration, the method checks whether the current
prediction is feasible, i.e., whether it belongs to the set of
predictions that can be obtained from models satisfying
all constraints on the training set. We refer to this set as
the Feasible Region (FR). If the constraints are satisfied, the
update proceeds as in standard gradient descent, optimizing
the loss without modification. Consider at training iteration
 the update of a set of model weights  during training.
When constraints are violated, the update is guided not only
by the gradient of the loss function but also by a corrective
direction  that steers the model towards the FR. Before
combining these vectors, the constraint direction is matched
to the gradient loss. The constraint direction is then scaled
by a factor greater than 1 to ensure it dominates the update
step. By default, this rescale factor is set to 1.5, although any
value greater than 1 is suficient. This approach guarantees
that the updated model moves closer to the FR [7]. An
illustration of this process for a two-weight update is shown
in Fig. 1, where the loss gradient and constraint direction
are shown in red and blue, respectively.</p>
        <p>(  )
  +1
  +2
  +3</p>
        <p>+4</p>
        <p>FR
Feasible Region</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Constraint-Guided PINN (CGPINN)</title>
        <p>PINNs ofer a framework for incorporating physical laws,
expressed as diferential equations, directly into the training
of neural networks. The vanilla training objective of a PINN
[5] is defined by
argmin  · (Φ(,  ) , )</p>
        <p>+ (1 −  ) · PDE(Φ(,  ))
where the loss function  measures the diference, typically
by considering the Mean Squared Error (MSE), of the
observations  corresponding to the observed inputs 
with the predictions of the neural network Φ with learnable
weights  , PDE measures how well the diferential
equation is obeyed by the neural network Φ for (typically) both
the observations and the unobserved collocation samples,
in this work named . The smaller its value, the better it is
obeyed. To balance both terms properly, a hyperparameter
 is present which needs tuning as indicated before. Note
that the PDE can also be an ordinary diferential equation.
This methodology is visualized in Fig. 2.</p>
        <p>Instead of relying on the manually tuned weighting
parameter  , CGPINNs reframes the training objective as a
constrained optimization problem, where the governing
PDE is enforced directly through explicit constraints. The
new training objective of CGPINN is therefore defined as
argmin  (Φ (,  ) , )

where the constraint tolerance  specifies the allowable
deviation of the neural network’s predictions from the
governing PDE. In this work, we initialize  at a relatively large
value (0.1), corresponding to a high FR, and progressively







 ( ,  )</p>
        <p />
        <p>Training Objective
1) still allows for a trivial solution to be found. If both
gradients in this PDE are zero, the equation is satisfied, but the
result is meaningless. In other words, as long as a trivial
solution is part of a constraint, it can happen that the model
will converge to it. Therefore, we must prevent this from
happening to ensure a meaningful result is found.</p>
        <p>
          To avoid converging to a trivial solution, where all
derivatives computed via AD are zero, we introduce two additional
sets of constraints that address special cases of the
governing PDE. To illustrate these constraints, we consider the
example of difusion in a one-dimensional rod with
Dirichlet boundary conditions, as presented in [16]. While this
example is used for clarity, the proposed technique is
general and not limited to any specific type of PDE. For the
difusion case under consideration, the governing PDE is
given by


(, ) = 
2
2 (, ),
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
where  is the thermal difusivity coeficient. For the
Dirichlet boundary conditions, the Initial Conditions (ICs) and the
        </p>
        <sec id="sec-3-2-1">
          <title>Boundary Conditions (BCs) are given by</title>
          <p>IC : (, 0) = sin
BC : (0, ) = (, ) = 0,
︁(  ︁)

 ,
for  ∈ [0, ] , (2)
for  ∈ [0,  ] . (3)</p>
          <p>
            The first additional set of constraints is obtained by
substituting the function (2) into the right-hand side of (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ). This
(, 0) =  −  2
2 sin
︁(  ︁)

 ,
for  = 0,  ∈ [0, ] .
yields


In other words, by combining the spatial partial derivatives
of the initial conditions with the governing PDE, we
derive an additional constraint. If the spatial derivatives are
zero, this constraint will be violated. Consequently, the FR
will exclude models that produce zero spatial derivatives,
efectively preventing such trivial solutions. This set of
constraints will be referred to from now on as ICCon.
          </p>
          <p>
            Similarly, the second set of constraints is obtained by
substituting the function in (3) into the left-hand side of (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ).
          </p>
        </sec>
        <sec id="sec-3-2-2">
          <title>This yields</title>
          <p>0 = 
2
2 (, ),</p>
          <p>for  ∈ {0, } ,  ∈ [0,  ] .</p>
          <p>As with the PDE constraint, a slack variable defined by
, is introduced that will allow some tolerance on the
constraint. To summarize, the optimization objective of
CG</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>PINNs is defined by</title>
          <p>argmin</p>
          <p>(Φ (,  ) , )
s.t. PDE (Φ (,  )) ≤ ,</p>
          <p>ICCon (Φ (,  )) ≤ ,</p>
          <p>BCCon (Φ (,  )) ≤ .</p>
          <p>This constrained optimization problem can be solved
directly by using CGGD [7]. In this work, the loss function 
is defined as the MSE between the boundary samples and
its ground truth values. The function PDE (Φ (,  ))
internally uses the required derivatives of the network output
with respect to the input variables, computed using AD. The
resulting residual quantifies how well the PDE is satisfied
at the sampled collocation points . The direction 
of the constraints is computed by calculating the
derivative of PDE (Φ (,  )). To ensure balanced influence, this
direction vector is scaled to have the same norm as the
corresponding loss gradient, an approach analogous to the
multi-head factor scaling used in [17]. A similar procedure
is applied to the remaining constraint terms.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>To validate our method and compare it against a vanilla
PINN, we implement a heat difusion experiment based on
[16]. The setup models one-dimensional heat difusion in
a rod of length , capturing the temperature distribution
(, ) over time . The governing physical process is
described by the following PDE:
(, )

= 
2(, )
2</p>
      <p>
        Following [16], we explore several configurations of
the rod length  and thermal difusivity coeficient
Specifically, we investigate the parameter pairs
{(5, 0.04), (
        <xref ref-type="bibr" rid="ref1">5, 1</xref>
        ), (
        <xref ref-type="bibr" rid="ref1 ref1">1, 1</xref>
        ), (
        <xref ref-type="bibr" rid="ref1">1, 25</xref>
        )}. These configurations
cover a broad range of physical regimes, from slow to fast
difusion with varying rod length
. To enable
meaningful comparisons across these settings, each simulation is
run over a time horizon defined by the difusive time scale
 = 2 . This characteristic time scale allows us to
normalize the simulation duration relative to the physical
properties of each setup and ensure that the dynamics are
com .
(,  ) ∈
pared over equivalent stages of difusion.
      </p>
      <sec id="sec-4-1">
        <title>The analytical solution of the PDE is</title>
        <p>(, ) = sin ︁(  ︁) · −   2 ,

and is used to generate the training, validation, and test sets.</p>
        <p>For the initial condition, spatial points are randomly
sampled from the domain, where (, 0) = sin (︀  ︀) .
Boundary values are sampled along the temporal domain at the
Dirichlet boundaries  = 0 and  = , with fixed
temperatures (0, ) = (, ) = 0. To enforce the PDE,
collocation points are sampled within the spatiotemporal domain
using Latin Hypercube Sampling (LHS) [18]. Fig. 3
provides a visual representation of the domain and the sampled
points.</p>
        <p>The training dataset consists of 128 initial and 128
boundary condition samples along with 1024 sampled collocation
points. The validation and test datasets consists of 1024
collocation samples each. While the training and validation
sets are re-sampled at each iteration to improve
generalization, the test set is randomly sampled once to evaluate the
model’s final performance.</p>
        <p>x
1
0.8
0.6
0.4
0.2
0</p>
        <p>Collocation samples
Boundary condition samples
Initial condition samples
u
1
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8</p>
        <p>1
t</p>
        <sec id="sec-4-1-1">
          <title>4.1. Evaluation Metric</title>
          <p>The primary performance indicator for all models is the
test loss, computed as the mean squared (prediction) error
(MSE) on an unseen test set of 1024 points sampled with
LHS from the spatio-temporal domain. The test loss reflects
how well the model generalizes to unseen data and provides
a direct measure of predictive accuracy. It is chosen as the
main metric because it quantifies the discrepancy between
the learned solution and the true analytical solution in a
data-driven and interpretable way.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.2. Network Architecture</title>
          <p>A standard Multilayer Perceptron (MLP) model is used for
the experiments. The network architecture consists of 4
hidden layers, each comprising 50 neurons. The hyperbolic
tangent activation function is applied in all hidden layers, a
common choice in both the PINN and regression literature
due to its smoothness and its capacity to support
higherorder derivatives, which are crucial for accurately modeling
and solving PDEs.</p>
          <p>The network is designed to approximate the solution
(, ) of the PDE. It takes a two-dimensional input vector,
consisting of the spatial coordinate  and the temporal
coordinate , and produces a single scalar output representing
the predicted temperature .</p>
          <p>The network’s forward pass is augmented to not only
compute the output , but also to leverage AD to calculate
with it the partial derivatives that are required to enforce
the PDE, namely  ,  , and 2 . These derivatives are
2
used to compute the PDE and boundary residuals, which are
needed to enforce the physical constraints during training.
Furthermore, to improve training stability, the input
coordinates (, ) are scaled to a normalized range before being
processed by the network. The derivatives are calculated
with respect to the original non-normalized input
coordinates so that gradients and constraints can be formulated
in the original physical or domain-specific units, and the
PDE captures the relation between the original physical
quantities.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>4.3. Training Process</title>
          <p>To ensure reproducibility, pseudo-random seeds are set for
Python’s built-in random module, as well as NumPy and
PyTorch. Deterministic behavior is also enabled where
applicable. A base seed, provided via the configuration or script
arguments, serves as the foundation from which individual
seeds are derived for each component that requires one. This
is to ensure that independence between random number
generators is maintained, avoiding unintended correlations
while preserving reproducibility across runs.</p>
          <p>We use the ADAM optimizer [19] in combination with
an exponential learning rate scheduler, following a similar
setup used in [16]. The initial learning rate is set to 10− 3 and
decays exponentially by a factor of 0.9. If no improvement
occurs within 1000 consecutive epochs, the learning rate is
reduced according to the predefined schedule.</p>
          <p>The constraint tolerance parameter  defines the
allowable error for considering a constraint as satisfied. It is
initially set to 1 and is progressively reduced when a
constraint satisfaction rate of 95% is achieved. This dynamic
adjustment enables automatic tuning of the tolerance for
each experiment and helps prevent the use of overly strict
tolerances, which could otherwise degrade the performance
of CGGD. Whenever the tolerance is dynamically reduced,
the learning rate is reset to the original value to allow the
model to restart learning from a better initialization.</p>
          <p>The training objective depends on the executed
experiment. For the vanilla PINN, the objective is a direct
minimization of the loss, which is a weighted sum of MSEs of
the boundary data and the collocation data based on
hyperparameter  :</p>
          <p>1 ∑︁ (obs, −
argmin  · 

=1
Φ (obs,,  ))2

1 ∑︁ (PDE (Φ ( ,  )))2
+ (1 −  ) ·</p>
          <p>=1</p>
          <p>For our CGPINNs method, the objective is determined by
CGGD and is to satisfy the constraints while simultaneously
minimizing the data loss:
10-epoch intervals for logging and plot generation.
Checkpointing and other scheduling functionalities, however,
continued to rely on per-epoch metrics directly.</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>4.4. Implementation Details</title>
          <p>The experiments are implemented in Python using PyTorch
[20]. In the experiments, gradients are computed using
PyTorch’s AD functionality. Although it is not strictly
necessary to calculate them in this way, AD provides a
straightforward and reliable method, simplifying the implementation
of gradient-based procedures and reducing the potential
for manual errors in derivative calculations. This approach
allows for eficient experimentation without
compromising flexibility, as alternative gradient computation methods
could also be employed if desired.</p>
          <p>In PyTorch, gradients can only be computed with respect
to leaf tensors. Consequently, it is not possible to compute
the gradient of the constraints directly with respect to the
model output, since output tensors are typically non-leaf
nodes in the computational graph. To address this, we create
a leaf tensor filled with ones, of shape [ℎ_, 1], with
gradient tracking enabled. This tensor is element-wise
multiplied with the output to produce a new tensor for which
gradients of the constraints can be computed. The resulting
gradient can then be transformed back to the gradient with
respect to the original output by dividing by the output
values.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results &amp; Discussion</title>
      <p>We compare the performance of our CGPINN approach
against a standard vanilla PINN across four diferent
physical configurations, varying the rod length
 and thermal
difusivity  . For the vanilla PINN, we report the test set
MSE loss for five diferent values of the weighting
hyperparameter  to showcase its sensitivity. For CGPINN, no such
hyperparameter is needed and experiments were repeated
over 3 diferent random seeds. In this case, we report the
mean and standard deviation of the test set MSE loss.
1Congrads: A Python Toolbox for constraint-guided deep learning.
Available at https://github.com/ML-KULeuven/congrads.</p>
      <sec id="sec-5-1">
        <title>5.1. Baseline Performance</title>
        <p>The results in Table 2 highlight the sensitivity of the vanilla
PINN to the choice of the hyperparameter  . For each
physical configuration, the performance varies significantly
across diferent  values. For instance, in the ( = 5,  =
0.04) case, the Test MSE changes by an order of
magnitude depending on  . The optimal value of  is
inconsistent across diferent configurations;

= 0.5 is best
for ( = 1, 
difusivity case
= 1), but it performs poorly for the
high( = 1,</p>
        <p>= 25). This demonstrates that
ifnding the right balance requires a costly, problem-specific
hyperparameter search. In the challenging high-difusivity
scenario, the vanilla PINN fails to converge to a good
solution for any tested  , with MSE values several orders of
magnitude higher than in other cases.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. CGPINN Performance</title>
        <p>95% of the test samples satisfy all imposed constraints. This
demonstrates the method’s ability to consistently enforce
constraints across diferent parameter regimes.
95% of samples had all constraints satisfied with the tolerance .</p>
        <p>Parameters

5
5
1
1

0.04
1
1
25</p>
        <p>CGPINN</p>
        <p>shows the analytical solution of the PDE; the center panel displays the vanilla PINN prediction (small) with hyperparameter
 = 0.5 and the corresponding diference from the analytical solution (large); the right panel presents the same results for our
CGPINN method.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion &amp; Future work</title>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>In this paper, the framework CGPINNs is introduced for
training PINNs using constrained optimization. By
leveraging CGGD, the proposed approach eases the need for
delicate and costly hyperparameter tuning associated with
balancing multiple loss terms in traditional PINNs. Two
novel sets of constraints are proposed, ICCon and BCCon,
which are derived from the governing PDE at the domain
boundaries that prevent the model from learning trivial
solutions.</p>
      <p>The experiments on a 1D heat difusion problem
demonstrate that CGPINN provides a more stable and robust
training procedure. It achieves good performance, and in
challenging cases superior to, a well-tuned vanilla PINN, without
requiring any sensitive weighting hyperparameters. This
makes the process of developing and training PINNs simpler
and more reliable.</p>
      <p>One possible direction for future work is to apply
CGPINN to more complex, multi-dimensional PDEs. Another
line of follow-up research is the comparison of CGPINN
to existing state-of-the-art techniques like [11, 13] and
further investigate the interplay between constraint tolerance
scheduling and learning rate scheduling to further improve
convergence speed and solution accuracy.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This research was supported by the DTF-PINN SBO project
of Flanders Make, the strategic research centre for the
manufacturing industry of Flanders, Belgium and received
funding from the Flemish Government (AI Research Program).
During the preparation of this work, the authors used
ChatGPT, Microsoft Copilot and Gemini in order to: Drafting
content, Paraphrase and reword, Improve writing style,
Grammar and spelling check. After using these tools, the authors
reviewed and edited the content as needed and take full
responsibility for the publication’s content.
[3] M. Baccouch,</p>
      <sec id="sec-8-1">
        <title>A brief summary of the finite ele</title>
        <p>ment method for diferential equations, in: M.
Baccouch (Ed.), Finite Element Methods and Their
Applications, IntechOpen, London, 2021. doi:10.5772/
intechopen.95423.</p>
        <p>Schönlieb, Can physics-informed neural networks
beat the finite element method?, 2023. doi: 10.48550/
ARXIV.2302.04107.
[5] M. Raissi, P. Perdikaris, G. Karniadakis,
Physicsinformed neural networks: A deep learning
framework for solving forward and inverse problems
involving nonlinear partial diferential equations,
Journal of Computational Physics 378 (2019) 686–707.
doi:10.1016/j.jcp.2018.10.045.
[6] S. Basir, Investigating and mitigating failure modes
in physics-informed neural networks (pinns), 2022.
doi:10.48550/ARXIV.2209.09988.
[7] Q. Van Baelen, P. Karsmakers, Constraint guided
gradient descent: Training with inequality constraints with
applications in regression and semantic segmentation,
Neurocomputing 556 (2023) 126636. doi:10.1016/j.
neucom.2023.126636.
[8] P. Rathore, W. Lei, Z. Frangella, L. Lu, M. Udell,
Challenges in training pinns: A loss landscape perspective,
2024. doi:10.48550/ARXIV.2402.01868.
[9] A. S. Krishnapriyan, A. Gholami, S. Zhe, R. M. Kirby,
M. W. Mahoney, Characterizing possible failure modes
in physics-informed neural networks, 2021. doi:10.
48550/ARXIV.2109.01050.
[10] A. Farea, O. Yli-Harja, F. Emmert-Streib,
Understanding physics-informed neural networks: Techniques,
applications, trends, and challenges, AI 5 (2024)
1534–1557. doi:10.3390/ai5030074.
[11] S. Wang, Y. Teng, P. Perdikaris, Understanding
and mitigating gradient flow pathologies in
physicsinformed neural networks, SIAM Journal on Scientific
Computing 43 (2021) A3055–A3081. doi:10.1137/
20m1318043.
[12] R. Bischof, M. A. Kraus, Multi-objective loss balancing
for physics-informed deep learning, Computer
Methods in Applied Mechanics and Engineering 439 (2025)
117914. doi:10.1016/j.cma.2025.117914.
[13] Z. Chen, V. Badrinarayanan, C.-Y. Lee, A. Rabinovich,
Gradnorm: Gradient normalization for adaptive loss
balancing in deep multitask networks, 2017. doi:10.
48550/ARXIV.1711.02257.
[14] H. Bi, T. D. Abhayapala, Point neuron
learning: a new physics-informed neural network
architecture, EURASIP Journal on Audio, Speech,
and Music Processing 2024 (2024). doi:10.1186/
s13636-024-00376-0.
[15] M. Tancik, P. P. Srinivasan, B. Mildenhall, S.
FridovichKeil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. T.
Barron, R. Ng, Fourier features let networks learn high
frequency functions in low dimensional domains, 2020.
doi:10.48550/ARXIV.2006.10739.
[16] F. M. Rohrhofer, S. Posch, C. Gößnitzer, B. C. Geiger,
Data vs. physics: The apparent pareto front of
physics-informed neural networks, IEEE Access
11 (2023) 86252–86261. doi:10.1109/ACCESS.2023.
3302892.
[17] Y. Tefera, Q. Van Baelen, M. Meire, S. Luca, P.
Karsmakers, Constraint-guided learning of data-driven health
indicator models: An application on the pronostia
bearing dataset, 2025. doi:10.48550/ARXIV.2503.
09113.
[18] M. D. McKay, R. J. Beckman, W. J. Conover, A
comparison of three methods for selecting values of input
variables in the analysis of output from a computer code,
Technometrics 21 (1979) 239. doi:10.2307/1268522.
[19] D. P. Kingma, J. Ba, Adam: A method for stochastic
optimization, 2014. doi:10.48550/ARXIV.1412.6980.
[20] A. Paszke, S. Gross, F. Massa, A. Lerer, J.
Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,
L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z.
DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner,
L. Fang, J. Bai, S. Chintala, Pytorch: An imperative</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yin</surname>
          </string-name>
          , G. E. Karniadakis,
          <article-title>Physics-informed neural networks (pinns) for fluid mechanics: a review</article-title>
          ,
          <source>Acta Mechanica Sinica</source>
          <volume>37</volume>
          (
          <year>2021</year>
          ) [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Perdikaris</surname>
          </string-name>
          , G. E. Karniadakis,
          <article-title>Physics-informed neural networks for heat transfer problems</article-title>
          ,
          <source>Journal of Heat Transfer</source>
          <volume>143</volume>
          (
          <year>2021</year>
          ). [4]
          <string-name>
            <given-names>T. G.</given-names>
            <surname>Grossmann</surname>
          </string-name>
          , U. J.
          <string-name>
            <surname>Komorowska</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Latz</surname>
          </string-name>
          , C.-B.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>