1. Introduction

10.1007/s10409-021-01148-1

Constraint-Guided PINNs: A Constrained Optimization Approach

Wout Rombouts

0 1 2

Quinten Van Baelen

0 1 2

Peter Karsmakers

0 1 2 0 Flanders Make @ KU Leuven , B-3000 Leuven , Belgium 1 KU Leuven, Dept. of Computer Science , Kleinhoefstraat 4, B-2440 Geel , Belgium 2 Leuven.AI - KU Leuven Institute for AI

2025

1727 1738

Physics-Informed Neural Networks (PINNs) have emerged as a powerful tool for solving Partial Diferential Equations (PDEs) by integrating physical laws into the learning process. However, PINNs often struggle with training instability and the challenge of balancing multiple loss terms, which typically requires extensive hyperparameter tuning. In this paper, we introduce Constraint Guided Physics-Informed Neural Networks (CGPINNs), a novel approach that leverages Constraint Guided Gradient Descent (CGGD) to train PINNs. CGPINN reframes the learning problem as a constrained optimization task, replacing complex hyperparameter balancing with more intuitive, semantically meaningful parameters. We also propose to add two sets of constraints derived from the PDE at the initial and boundary conditions, which prevent the model from converging to trivial solutions when using CGGD. Our experiments on a simulated heat difusion problem demonstrate that CGPINN ofers a more stable and robust training procedure, efectively learning the underlying physics without the need for expensive hyperparameter searches.

eol>Physics-Informed Neural Networks Constraint-Guided Gradient Descent Neuro-Symbolic AI Constrained Optimization Partial Diferential Equations

1. Introduction

Neuro-symbolic AI aims to combine deep learning with knowledge-based systems, bridging the gap between statistical learning and symbolic reasoning. Physics-Informed Neural Networks (PINNs) fall under this paradigm, as they integrate prior knowledge, in the form of physically inspired diferential equations, into the learning process. In PINNs, this physical knowledge is encoded through Partial Diferential Equations (PDEs) and is embedded into the neural network training, enforcing physical laws or prior knowledge during the learning process into the model weights. This enables PINNs to learn solutions that are consistent not only with observational data but also with the governing physics of the problem.

PDEs play a fundamental role in modeling a wide range of physical phenomena across science and engineering, including fields like fluid dynamics, heat transfer and others [ 1, 2]. Traditional numerical methods, such as the finite element method, have long been the standard for solving PDEs [3]. While these methods are robust and well-established, they often face challenges when extended to high-dimensional problems and can sufer from high computational cost and very slow inference [4]. In contrast, PINNs represent a novel approach to solving PDEs by leveraging the expressive power and fast inference of deep neural networks [5].

Despite their potential, they are not without limitations. It is generally known that PINNs can be hard to train. A common challenge is the balancing of diferent loss components, such as the data-fitting term and the PDE residual term, which often requires careful and costly hyperparameter tuning, as an imbalance in these terms can lead to slow convergence or a failure to learn a good solution [6].

To address these challenges, we introduce Constraint

Guided Physics-Informed Neural Networks (CGPINNs), which reformulates the training of PINNs as a constrained optimization problem. To solve this, we propose a novel training methodology based on Constraint Guided Gradient Descent (CGGD) [7]. CGGD is a learning framework that enables the training of deep learning models by minimizing an objective function while explicitly satisfying a set of constraints, including those involving continuous variables. This approach allows constraints to be enforced directly during training, thereby eliminating the need to manually tune weighting hyperparameters for balancing multiple loss terms.

This article is organised as follows. In Section 2, we discuss related work on addressing PINN training dificulties, including adaptive weighting strategies and architectural modifications. Section 3 details our methodology, starting with an introduction to the CGGD algorithm, followed by the new CGPINN method, its learning objective and the inclusion of initial and boundary condition constraints. Section 4 outlines heat difusion experiments, covering the setup and physical configurations, data generation, evaluation metrics, network architecture, and the training process. Section 5 presents and discusses the results, comparing CGPINN’s performance against a vanilla PINN baseline. Finally, section 6 provides the conclusion and outlines future work.

2. Related Work

The challenge of efectively training PINNs is widely recognized in the scientific machine learning community [8, 9, 10]. One of the core dificulties originates from the multi-objective nature of the standard training process, which relies on minimizing a composite loss function. This loss typically combines a data-fitting term with a physicsbased residual term that enforces a PDE. These diferent objectives often result in a dificult training process, as the corresponding loss terms can have vastly diferent scales and gradient magnitudes. This “gradient pathology” [11] can cause the optimization process to be dominated by one objective, leading to slow convergence or a failure to find a physically meaningful solution.

Much of the existing research on multi-objective optimization, though not specific to PINNs, has focused on addressing this issue through adaptive weighting strategies [12]. These methods aim to dynamically adjust the relative importance of each loss component during training, in an efort to achieve a more balanced and efective optimization process. Examples include approaches such as GradNorm [13] that adjust weights based on the norm of the gradients of each loss term, attempting to ensure that all objectives contribute meaningfully to the weight updates. Other techniques, such as Learning Rate Annealing [11], assign diferent learning rates to diferent parts of the loss function and anneal them over time. While often efective, these methods can introduce new hyperparameters that require careful, and often expensive, tuning.

Another line of research has explored architectural modiifcations to improve PINN performance. Some studies have demonstrated that using specialized architectures or adaptive elements can enhance the network’s ability to approximate complex solutions [14, 11]. Others have incorporated techniques like Fourier feature mappings [15] to help the network learn high-frequency components that are common in physical phenomena but are notoriously dificult for standard MLPs to capture. While beneficial, these architectural changes do not fundamentally alter the underlying training challenge of balancing competing loss objectives.

Our work takes a diferent path by reframing the PINN training problem from a multi-objective optimization task to a constrained optimization task. Instead of balancing competing objectives, we treat the physical laws as constraints that the solution must satisfy.

3. Methodology

At the core of our proposed methodology for training CGPINNs lies the CGGD algorithm [7]. We begin with a brief introduction to CGGD before detailing the CGPINN framework.

3.1. Constraint Guided Gradient Descent (CGGD)

CGGD is an optimization framework that enhances traditional gradient descent by incorporating hard inequality constraints into the training process. Unlike conventional approaches that rely solely on minimizing data-driven or multi-objective loss functions, CGGD introduces a mechanism to enforce a set of constraints throughout the training.

At each iteration, the method checks whether the current prediction is feasible, i.e., whether it belongs to the set of predictions that can be obtained from models satisfying all constraints on the training set. We refer to this set as the Feasible Region (FR). If the constraints are satisfied, the update proceeds as in standard gradient descent, optimizing the loss without modification. Consider at training iteration the update of a set of model weights during training. When constraints are violated, the update is guided not only by the gradient of the loss function but also by a corrective direction that steers the model towards the FR. Before combining these vectors, the constraint direction is matched to the gradient loss. The constraint direction is then scaled by a factor greater than 1 to ensure it dominates the update step. By default, this rescale factor is set to 1.5, although any value greater than 1 is suficient. This approach guarantees that the updated model moves closer to the FR [7]. An illustration of this process for a two-weight update is shown in Fig. 1, where the loss gradient and constraint direction are shown in red and blue, respectively.

( ) +1 +2 +3

FR Feasible Region

3.2. Constraint-Guided PINN (CGPINN)

PINNs ofer a framework for incorporating physical laws, expressed as diferential equations, directly into the training of neural networks. The vanilla training objective of a PINN [5] is defined by argmin · (Φ(, ) , )

+ (1 − ) · PDE(Φ(, )) where the loss function measures the diference, typically by considering the Mean Squared Error (MSE), of the observations corresponding to the observed inputs with the predictions of the neural network Φ with learnable weights , PDE measures how well the diferential equation is obeyed by the neural network Φ for (typically) both the observations and the unobserved collocation samples, in this work named . The smaller its value, the better it is obeyed. To balance both terms properly, a hyperparameter is present which needs tuning as indicated before. Note that the PDE can also be an ordinary diferential equation. This methodology is visualized in Fig. 2.

Instead of relying on the manually tuned weighting parameter , CGPINNs reframes the training objective as a constrained optimization problem, where the governing PDE is enforced directly through explicit constraints. The new training objective of CGPINN is therefore defined as argmin (Φ (, ) , ) where the constraint tolerance specifies the allowable deviation of the neural network’s predictions from the governing PDE. In this work, we initialize at a relatively large value (0.1), corresponding to a high FR, and progressively ( , )

Training Objective 1) still allows for a trivial solution to be found. If both gradients in this PDE are zero, the equation is satisfied, but the result is meaningless. In other words, as long as a trivial solution is part of a constraint, it can happen that the model will converge to it. Therefore, we must prevent this from happening to ensure a meaningful result is found.

To avoid converging to a trivial solution, where all derivatives computed via AD are zero, we introduce two additional sets of constraints that address special cases of the governing PDE. To illustrate these constraints, we consider the example of difusion in a one-dimensional rod with Dirichlet boundary conditions, as presented in [16]. While this example is used for clarity, the proposed technique is general and not limited to any specific type of PDE. For the difusion case under consideration, the governing PDE is given by (, ) = 2 2 (, ), ( 1 ) where is the thermal difusivity coeficient. For the Dirichlet boundary conditions, the Initial Conditions (ICs) and the

Boundary Conditions (BCs) are given by

IC : (, 0) = sin BC : (0, ) = (, ) = 0, ︁( ︁) , for ∈ [0, ] , (2) for ∈ [0, ] . (3)

The first additional set of constraints is obtained by substituting the function (2) into the right-hand side of ( 1 ). This (, 0) = − 2 2 sin ︁( ︁) , for = 0, ∈ [0, ] . yields In other words, by combining the spatial partial derivatives of the initial conditions with the governing PDE, we derive an additional constraint. If the spatial derivatives are zero, this constraint will be violated. Consequently, the FR will exclude models that produce zero spatial derivatives, efectively preventing such trivial solutions. This set of constraints will be referred to from now on as ICCon.

Similarly, the second set of constraints is obtained by substituting the function in (3) into the left-hand side of ( 1 ).

This yields

0 = 2 2 (, ),

for ∈ {0, } , ∈ [0, ] .

As with the PDE constraint, a slack variable defined by , is introduced that will allow some tolerance on the constraint. To summarize, the optimization objective of CG

PINNs is defined by

argmin

(Φ (, ) , ) s.t. PDE (Φ (, )) ≤ ,

ICCon (Φ (, )) ≤ ,

BCCon (Φ (, )) ≤ .

This constrained optimization problem can be solved directly by using CGGD [7]. In this work, the loss function is defined as the MSE between the boundary samples and its ground truth values. The function PDE (Φ (, )) internally uses the required derivatives of the network output with respect to the input variables, computed using AD. The resulting residual quantifies how well the PDE is satisfied at the sampled collocation points . The direction of the constraints is computed by calculating the derivative of PDE (Φ (, )). To ensure balanced influence, this direction vector is scaled to have the same norm as the corresponding loss gradient, an approach analogous to the multi-head factor scaling used in [17]. A similar procedure is applied to the remaining constraint terms.

4. Experiments

To validate our method and compare it against a vanilla PINN, we implement a heat difusion experiment based on [16]. The setup models one-dimensional heat difusion in a rod of length , capturing the temperature distribution (, ) over time . The governing physical process is described by the following PDE: (, ) = 2(, ) 2

Following [16], we explore several configurations of the rod length and thermal difusivity coeficient Specifically, we investigate the parameter pairs {(5, 0.04), ( 5, 1 ), ( 1, 1 ), ( 1, 25 )}. These configurations cover a broad range of physical regimes, from slow to fast difusion with varying rod length . To enable meaningful comparisons across these settings, each simulation is run over a time horizon defined by the difusive time scale = 2 . This characteristic time scale allows us to normalize the simulation duration relative to the physical properties of each setup and ensure that the dynamics are com . (, ) ∈ pared over equivalent stages of difusion.

The analytical solution of the PDE is

(, ) = sin ︁( ︁) · − 2 , and is used to generate the training, validation, and test sets.

For the initial condition, spatial points are randomly sampled from the domain, where (, 0) = sin (︀ ︀) . Boundary values are sampled along the temporal domain at the Dirichlet boundaries = 0 and = , with fixed temperatures (0, ) = (, ) = 0. To enforce the PDE, collocation points are sampled within the spatiotemporal domain using Latin Hypercube Sampling (LHS) [18]. Fig. 3 provides a visual representation of the domain and the sampled points.

The training dataset consists of 128 initial and 128 boundary condition samples along with 1024 sampled collocation points. The validation and test datasets consists of 1024 collocation samples each. While the training and validation sets are re-sampled at each iteration to improve generalization, the test set is randomly sampled once to evaluate the model’s final performance.

x 1 0.8 0.6 0.4 0.2 0

Collocation samples Boundary condition samples Initial condition samples u 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8

1 t

4.1. Evaluation Metric

The primary performance indicator for all models is the test loss, computed as the mean squared (prediction) error (MSE) on an unseen test set of 1024 points sampled with LHS from the spatio-temporal domain. The test loss reflects how well the model generalizes to unseen data and provides a direct measure of predictive accuracy. It is chosen as the main metric because it quantifies the discrepancy between the learned solution and the true analytical solution in a data-driven and interpretable way.

4.2. Network Architecture

A standard Multilayer Perceptron (MLP) model is used for the experiments. The network architecture consists of 4 hidden layers, each comprising 50 neurons. The hyperbolic tangent activation function is applied in all hidden layers, a common choice in both the PINN and regression literature due to its smoothness and its capacity to support higherorder derivatives, which are crucial for accurately modeling and solving PDEs.

The network is designed to approximate the solution (, ) of the PDE. It takes a two-dimensional input vector, consisting of the spatial coordinate and the temporal coordinate , and produces a single scalar output representing the predicted temperature .

The network’s forward pass is augmented to not only compute the output , but also to leverage AD to calculate with it the partial derivatives that are required to enforce the PDE, namely , , and 2 . These derivatives are 2 used to compute the PDE and boundary residuals, which are needed to enforce the physical constraints during training. Furthermore, to improve training stability, the input coordinates (, ) are scaled to a normalized range before being processed by the network. The derivatives are calculated with respect to the original non-normalized input coordinates so that gradients and constraints can be formulated in the original physical or domain-specific units, and the PDE captures the relation between the original physical quantities.

4.3. Training Process

To ensure reproducibility, pseudo-random seeds are set for Python’s built-in random module, as well as NumPy and PyTorch. Deterministic behavior is also enabled where applicable. A base seed, provided via the configuration or script arguments, serves as the foundation from which individual seeds are derived for each component that requires one. This is to ensure that independence between random number generators is maintained, avoiding unintended correlations while preserving reproducibility across runs.

We use the ADAM optimizer [19] in combination with an exponential learning rate scheduler, following a similar setup used in [16]. The initial learning rate is set to 10− 3 and decays exponentially by a factor of 0.9. If no improvement occurs within 1000 consecutive epochs, the learning rate is reduced according to the predefined schedule.

The constraint tolerance parameter defines the allowable error for considering a constraint as satisfied. It is initially set to 1 and is progressively reduced when a constraint satisfaction rate of 95% is achieved. This dynamic adjustment enables automatic tuning of the tolerance for each experiment and helps prevent the use of overly strict tolerances, which could otherwise degrade the performance of CGGD. Whenever the tolerance is dynamically reduced, the learning rate is reset to the original value to allow the model to restart learning from a better initialization.

The training objective depends on the executed experiment. For the vanilla PINN, the objective is a direct minimization of the loss, which is a weighted sum of MSEs of the boundary data and the collocation data based on hyperparameter :

1 ∑︁ (obs, − argmin · =1 Φ (obs,, ))2 1 ∑︁ (PDE (Φ ( , )))2 + (1 − ) ·

For our CGPINNs method, the objective is determined by CGGD and is to satisfy the constraints while simultaneously minimizing the data loss: 10-epoch intervals for logging and plot generation. Checkpointing and other scheduling functionalities, however, continued to rely on per-epoch metrics directly.

4.4. Implementation Details

The experiments are implemented in Python using PyTorch [20]. In the experiments, gradients are computed using PyTorch’s AD functionality. Although it is not strictly necessary to calculate them in this way, AD provides a straightforward and reliable method, simplifying the implementation of gradient-based procedures and reducing the potential for manual errors in derivative calculations. This approach allows for eficient experimentation without compromising flexibility, as alternative gradient computation methods could also be employed if desired.

In PyTorch, gradients can only be computed with respect to leaf tensors. Consequently, it is not possible to compute the gradient of the constraints directly with respect to the model output, since output tensors are typically non-leaf nodes in the computational graph. To address this, we create a leaf tensor filled with ones, of shape [ℎ_, 1], with gradient tracking enabled. This tensor is element-wise multiplied with the output to produce a new tensor for which gradients of the constraints can be computed. The resulting gradient can then be transformed back to the gradient with respect to the original output by dividing by the output values.

5. Results & Discussion

We compare the performance of our CGPINN approach against a standard vanilla PINN across four diferent physical configurations, varying the rod length and thermal difusivity . For the vanilla PINN, we report the test set MSE loss for five diferent values of the weighting hyperparameter to showcase its sensitivity. For CGPINN, no such hyperparameter is needed and experiments were repeated over 3 diferent random seeds. In this case, we report the mean and standard deviation of the test set MSE loss. 1Congrads: A Python Toolbox for constraint-guided deep learning. Available at https://github.com/ML-KULeuven/congrads.

5.1. Baseline Performance

The results in Table 2 highlight the sensitivity of the vanilla PINN to the choice of the hyperparameter . For each physical configuration, the performance varies significantly across diferent values. For instance, in the ( = 5, = 0.04) case, the Test MSE changes by an order of magnitude depending on . The optimal value of is inconsistent across diferent configurations; = 0.5 is best for ( = 1, difusivity case = 1), but it performs poorly for the high( = 1,

= 25). This demonstrates that ifnding the right balance requires a costly, problem-specific hyperparameter search. In the challenging high-difusivity scenario, the vanilla PINN fails to converge to a good solution for any tested , with MSE values several orders of magnitude higher than in other cases.

5.2. CGPINN Performance

95% of the test samples satisfy all imposed constraints. This demonstrates the method’s ability to consistently enforce constraints across diferent parameter regimes. 95% of samples had all constraints satisfied with the tolerance .

Parameters 5 5 1 1 0.04 1 1 25

CGPINN

shows the analytical solution of the PDE; the center panel displays the vanilla PINN prediction (small) with hyperparameter = 0.5 and the corresponding diference from the analytical solution (large); the right panel presents the same results for our CGPINN method.

6. Conclusion & Future work Declaration on Generative AI

In this paper, the framework CGPINNs is introduced for training PINNs using constrained optimization. By leveraging CGGD, the proposed approach eases the need for delicate and costly hyperparameter tuning associated with balancing multiple loss terms in traditional PINNs. Two novel sets of constraints are proposed, ICCon and BCCon, which are derived from the governing PDE at the domain boundaries that prevent the model from learning trivial solutions.

The experiments on a 1D heat difusion problem demonstrate that CGPINN provides a more stable and robust training procedure. It achieves good performance, and in challenging cases superior to, a well-tuned vanilla PINN, without requiring any sensitive weighting hyperparameters. This makes the process of developing and training PINNs simpler and more reliable.

One possible direction for future work is to apply CGPINN to more complex, multi-dimensional PDEs. Another line of follow-up research is the comparison of CGPINN to existing state-of-the-art techniques like [11, 13] and further investigate the interplay between constraint tolerance scheduling and learning rate scheduling to further improve convergence speed and solution accuracy.

Acknowledgments

This research was supported by the DTF-PINN SBO project of Flanders Make, the strategic research centre for the manufacturing industry of Flanders, Belgium and received funding from the Flemish Government (AI Research Program). During the preparation of this work, the authors used ChatGPT, Microsoft Copilot and Gemini in order to: Drafting content, Paraphrase and reword, Improve writing style, Grammar and spelling check. After using these tools, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content. [3] M. Baccouch,

A brief summary of the finite ele

ment method for diferential equations, in: M. Baccouch (Ed.), Finite Element Methods and Their Applications, IntechOpen, London, 2021. doi:10.5772/ intechopen.95423.

Schönlieb, Can physics-informed neural networks beat the finite element method?, 2023. doi: 10.48550/ ARXIV.2302.04107. [5] M. Raissi, P. Perdikaris, G. Karniadakis, Physicsinformed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial diferential equations, Journal of Computational Physics 378 (2019) 686–707. doi:10.1016/j.jcp.2018.10.045. [6] S. Basir, Investigating and mitigating failure modes in physics-informed neural networks (pinns), 2022. doi:10.48550/ARXIV.2209.09988. [7] Q. Van Baelen, P. Karsmakers, Constraint guided gradient descent: Training with inequality constraints with applications in regression and semantic segmentation, Neurocomputing 556 (2023) 126636. doi:10.1016/j. neucom.2023.126636. [8] P. Rathore, W. Lei, Z. Frangella, L. Lu, M. Udell, Challenges in training pinns: A loss landscape perspective, 2024. doi:10.48550/ARXIV.2402.01868. [9] A. S. Krishnapriyan, A. Gholami, S. Zhe, R. M. Kirby, M. W. Mahoney, Characterizing possible failure modes in physics-informed neural networks, 2021. doi:10. 48550/ARXIV.2109.01050. [10] A. Farea, O. Yli-Harja, F. Emmert-Streib, Understanding physics-informed neural networks: Techniques, applications, trends, and challenges, AI 5 (2024) 1534–1557. doi:10.3390/ai5030074. [11] S. Wang, Y. Teng, P. Perdikaris, Understanding and mitigating gradient flow pathologies in physicsinformed neural networks, SIAM Journal on Scientific Computing 43 (2021) A3055–A3081. doi:10.1137/ 20m1318043. [12] R. Bischof, M. A. Kraus, Multi-objective loss balancing for physics-informed deep learning, Computer Methods in Applied Mechanics and Engineering 439 (2025) 117914. doi:10.1016/j.cma.2025.117914. [13] Z. Chen, V. Badrinarayanan, C.-Y. Lee, A. Rabinovich, Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks, 2017. doi:10. 48550/ARXIV.1711.02257. [14] H. Bi, T. D. Abhayapala, Point neuron learning: a new physics-informed neural network architecture, EURASIP Journal on Audio, Speech, and Music Processing 2024 (2024). doi:10.1186/ s13636-024-00376-0. [15] M. Tancik, P. P. Srinivasan, B. Mildenhall, S. FridovichKeil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. T. Barron, R. Ng, Fourier features let networks learn high frequency functions in low dimensional domains, 2020. doi:10.48550/ARXIV.2006.10739. [16] F. M. Rohrhofer, S. Posch, C. Gößnitzer, B. C. Geiger, Data vs. physics: The apparent pareto front of physics-informed neural networks, IEEE Access 11 (2023) 86252–86261. doi:10.1109/ACCESS.2023. 3302892. [17] Y. Tefera, Q. Van Baelen, M. Meire, S. Luca, P. Karsmakers, Constraint-guided learning of data-driven health indicator models: An application on the pronostia bearing dataset, 2025. doi:10.48550/ARXIV.2503. 09113. [18] M. D. McKay, R. J. Beckman, W. J. Conover, A comparison of three methods for selecting values of input variables in the analysis of output from a computer code, Technometrics 21 (1979) 239. doi:10.2307/1268522. [19] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014. doi:10.48550/ARXIV.1412.6980. [20] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, Pytorch: An imperative

[1]

Cai ,

Mao ,

Wang ,

Yin , G. E. Karniadakis, Physics-informed neural networks (pinns) for fluid mechanics: a review , Acta Mechanica Sinica 37 ( 2021 ) [2]

Cai ,

Wang ,

Perdikaris , G. E. Karniadakis, Physics-informed neural networks for heat transfer problems , Journal of Heat Transfer 143 ( 2021 ). [4]

T. G.

Grossmann , U. J. Komorowska , J. Latz , C.-B.