Learning Physics-guided Neural Networks with Competing Physics Loss: A
                Summary of Results in Solving Eigenvalue Problems
 Mohannad Elhamod1 * , Jie Bu1 * , Christopher Singh2 , Matthew Redell2 , Abantika Ghosh3 , Viktor
                         Podolskiy3 , Wei-Cheng Lee2 , Anuj Karpatne1
                                          1
                                               Department of Computer Science, Virginia Tech,
                                          2
                                               Department of Physics, Binghamton University,
                           3
                             Department of Physics and Applied Physics, University of Massachusetts Lowell,
                                                               *
                                                                 Equal contribution,
{elhamod, jayroxis, karpatne}@vt.edu, {csingh5, mredell1, wlee}@binghamton.edu, {abantika, viktor podolskiy }@uml.edu
                               Abstract                                         While some existing work in PGNN have attempted to
                                                                             learn neural networks by solely minimizing PG loss (and
    Existing work in Physics-guided Neural Networks (PGNNs)                  thus being label-free) (Raissi, Perdikaris, and Karniadakis
    have demonstrated the efficacy of adding single PG loss                  2019; Stewart and Ermon 2017), others have used both PG
    functions in the neural network objectives, using constant
                                                                             loss and data label loss using appropriate trade-off hyper-
    trade-off parameters, to ensure better generalizability. How-
    ever, in the presence of multiple physics loss functions with            parameters (Karpatne et al. 2017c; Jia et al. 2019). How-
    competing gradient directions, there is a need to adaptively             ever, what is even more challenging is when there are mul-
    tune the contribution of competing PG loss functions dur-                tiple physics equations with competing PG loss functions
    ing the course of training to arrive at generalizable solutions.         that need to be minimized together, where each PG loss
    We demonstrate the presence of competing PG losses in the                may show multiple local minima. In such situations, sim-
    generic neural network problem of solving for the lowest (or             ple addition of PG losses in the objective function with
    highest) eigenvector of a physics-based eigenvalue equation,             constant trade-off hyper-parameters may result in the learn-
    common to many scientific problems. We present a novel ap-               ing of non-generalizable solutions. This may seem counter-
    proach to handle competing PG losses and demonstrate its                 intuitive since the addition of PG loss is generally assumed
    efficacy in learning generalizable solutions in two motivat-
                                                                             to offer generalizability in the PGNN literature (Karpatne
    ing applications of quantum mechanics and electromagnetic
    propagation.                                                             et al. 2017c; de Bezenac, Pajot, and Gallinari 2019; Shin,
                                                                             Darbon, and Karniadakis 2020). This motivates us to ask the
                                                                             question: is it possible to adaptively balance the importance
                        1 Introduction                                       of competing PG loss functions at different stages of neural
                                                                             network learning to arrive at generalizable solutions?
With the increasing impact of deep learning methods in
diverse scientific disciplines (Appenzeller 2017; Graham-
Rowe et al. 2008), there is a growing realization in the sci-
entific community to harness the power of artificial neu-                       In this work, we introduce a novel framework of Co-
ral networks (ANNs) without ignoring the rich supervision                    Phy-PGNN,     which is an abbreviation for Competing Physics
available in the form of physics knowledge in several scien-                 Physics-Guided    Neural Networks, to handle competing PG
tific problems (Karpatne et al. 2017a; Willard et al. 2020).                 loss functions in neural network training. We specifically
One of the promising lines of research in this direction is to               consider the domain of scientific problems where physics
modify the objective function of neural networks by adding                   knowledge are represented as eigenvalue equations and we
loss functions that measure the violations of ANN outputs                    are required to solve for the highest or lowest eigen-solution.
with physical equations, termed as physics-guided (PG) loss                  This representation is common to many types of physics
functions (Karpatne et al. 2017b; Stewart and Ermon 2017).                   such as the Schrödinger equation in the domain of quan-
By anchoring ANN models to be consistent with physics,                       tum mechanics and Maxwell’s equations in the domain of
PG loss functions have been shown to impart generalizabil-                   electromagnetic propagation. In these applications, solving
ity even in the paucity of training data across several scien-               eigenvalue equations using exact numerical techniques (e.g.,
tific problems (Jia et al. 2019; Karpatne et al. 2017c; Raissi,              diagonalization methods) can be computationally expensive
Perdikaris, and Karniadakis 2019; de Bezenac, Pajot, and                     especially for large physical systems. On the other hand,
Gallinari 2019). We refer to the class of neural networks that               PGNN models, once trained, can be applied on testing sce-
are trained using PG loss functions as physics-guided neural                 narios to predict their eigen-solutions in drastically smaller
networks (PGNNs).                                                            running times. We empirically demonstrate the efficacy of
                                                                             our CoPhy-PGNN solution on two diverse applications in
 Copyright © 2021 for this paper by its authors. Use permitted un-           quantum mechanics and electromagnetic propagation, high-
der Creative Commons License Attribution 4.0 International (CC               lighting the generalizability of our proposed approach to
BY 4.0)                                                                      many physics problems.
                     2    Background                                   The second category of methods incorporate PG loss as
                                                                    additional terms in the objective function along with la-
2.1   Overview of Physics Problems:
                                                                    bel loss, using constant trade-off hyper-parameters. This
The physics of the problem is available in the form of an           includes work in basic Physics-guided Neural Networks
eigen-value equation of the form: Ây = by, where, for a            (PGNNs) (Karpatne et al. 2017c; Jia et al. 2019) for the
given input matrix Â, b is an eigenvalue and y is the corre-       target application of lake temperature modeling. We use an
sponding eigenvector. We are interested in solving the low-         analogue of this basic PGNN as a baseline in our experi-
est or highest eigen-solution of this equation in our target        ments.
problems. Here, we provide a brief overview of the two tar-            While some recent works have investigated the effects of
get applications.                                                   PG loss on generalization performance (Shin, Darbon, and
                                                                    Karniadakis 2020) and the importance of normalizing the
                                                                    scale of hyper-parameters corresponding to PG loss terms
Quantum Mechanics: In this application, the goal is to              (Wang, Teng, and Perdikaris 2020), they do not study the ef-
predict the ground-state wave function of an Ising chain            fects of competing physics losses which is the focus of this
model with n = 4 particles. This problem can be described           paper. Our work is related to the field of multi-task learn-
by the Schrödinger equation HΨ̂ = Ê Ψ̂, where Ê, the en-         ing (MTL) (Caruana 1993), as the minimization of physics
ergy level, is the eigenvalue; Ψ̂, the wave function, is the        losses and label loss can be viewed as multiple shared tasks.
eigenvector, and H, the Hamiltonian, is the matrix. Since the       For example, alternating minimization techniques in MTL
ground-state wave function corresponds to the lowest energy         (Kang, Grauman, and Sha 2011) in MTL can be used to
level, we are interested in finding the lowest eigen-solution       alternate between minimizing different PG loss and label
of this eigen-value equation. To be able to execute a detailed      loss terms over different mini-batches. We consider this as
analysis, we choose a small problem scale (n = 4) for this          a baseline approach in our experiments.
application.
                                                                                          3    Methodology
Electromagnetic Propagation: To illustrate our model’s              3.1    Problem statement:
scalability to large systems, we consider another applica-          From an ML perspective, we are given a collection of train-
tion involving the propagation of the electromagnetic waves         ing pairs, DT r := {Âi , (yi , bi )}N
                                                                                                         i=1 , where (yi , bi ) is gener-
in periodically stratified layer stacks. The description of         ated by diagonalization solvers. We consider the problem of
this propagation can be reduced to the eigenvalue problem
                                                                    learning an ANN model, (ŷ, b̂) = fN N (Â, θ), that can pre-
Â~hm = kzm ~hm where kzm , the propagation constant of the
              2              2


electromagnetic modes along the layers, is the eigenvalue;          dict (y, b) for any input matrix, Â, where θ are the learnable
                                                                    parameters of ANN. We are also given a set of unlabeled
and ~hm , the coefficients of the Fourier transform of the spa-
tial profile of the electromagnetic field, is the eigenvector. It   examples, DU := {Âi }M    i=1 , which will be used for testing.
is important to note for this application that these quantities     We consider a simple feed-forward architecture of fN N in
are complex valued, and that we are interested in the largest       all our formulations.
eigenvalue rather than the smallest.
                                                                    3.2    Designing physics-guided loss functions:
2.2   Related work in PGNN:                                         A naı̈ve approach for learning fN N is to minimize the mean
                                                                    sum of squared errors (MSE) of predictions on the training
PGNN has found successful applications in several disci-            set, referred to as the Train-MSE. However, instead of solely
plines including fluid dynamics (Wang, Wu, and Xiao 2017,           relying on Train-MSE, we consider the following PG loss
2016; Wang et al. 2017), climate science (de Bezenac, Pa-           terms to guide the learning of fN N to generalizable solu-
jot, and Gallinari 2019), and lake modeling (Karpatne et al.        tions:
2017c; Jia et al. 2019; Daw et al. 2020). However, to the
best of our knowledge, PGNN formulations have not been
explored yet for our target applications of solving eigen-          Characteristic Loss: A fundamental equation we want to
value equations in the field of quantum mechanics and elec-         satisfy in our predictions, (ŷ, b̂), for any input Â is the eigen-
tromagnetic propagation. Existing work in PGNN can be               value equation, Âŷ = b̂ŷ. Hence, we consider minimizing
broadly divided into two categories. The first category in-         the following equation:
volves label-free learning by only minimizing PG loss with-
out using any labeled data. For example, Physics-informed                                         X ||Âi ŷi − b̂i ŷi ||2
                                                                                 C-Loss(θ) :=                                 ,      (1)
neural networks (PINNs) and its variants (Raissi, Perdikaris,
                                                                                                    i
                                                                                                              ŷ > ŷ
and Karniadakis 2019, 2017a,b) have been recently devel-
oped to solve PDEs by solely minimizing PG loss functions,          where the denominator term ensures that ŷ resides on a unit
for simple canonical problems such as Burger’s equation.            hyper-sphere with ||ŷ|| = 1, thus avoiding scaling issues.
Since these methods are label-free, they do not explore the         Note that by construction, C-Loss only depends on the pre-
interplay between PG loss and label loss. We consider an            dictions of fN N and does not rely on true labels, (y, b).
analogue of PINN for our target application as a baseline in        Hence, C-Loss can be evaluated even on the unlabeled test
our experiments.                                                    data, DU .
Spectrum Loss: Note that there are many non-interesting            Cold Starting λC : The second observation we make is on
solutions of Âŷ = b̂ŷ that can appear as “local minima”         the effect of C-Loss on the convergence of gradient descent
in the optimization landscape of C-Loss. For example, for          towards a generalizable solution. Note that C-Loss suffers
every input Âi ∈ DU , there are d possible eigen-solutions        from a large number of local minima and hence is suscepti-
(where d is the length of ŷ), each of which will result in        ble to favoring the learning of non-generalizable solutions.
a perfectly low value of C-Loss = 0, thus acting as a lo-          Hence, in the beginning epochs, it is important to keep C-
cal minima. However, we are only interested in a specific          Loss turned off. Once we have crossed a sufficient number
eigenvalue—usually the smallest or the largest—for every           of epochs and have already zoomed into a region in the pa-
                                                                   rameter space in close vicinity to a generalizable solution,
Âi . Therefore, we consider minimizing another PG loss term
                                                                   we can safely turn on C-Loss so that it can help refine θ to
that ensures the predicted b̂ at every sample is the desired       converge to the generalizable solution. Essentially, we “cold
one. In the case of the quantum mechanics application, we          starting” λC as given by the following procedure:
use the following loss to find the smallest eigen-solution:
                                X                                        λC (t) = λC0 × sigmoid(αC × (t − Ta )),                     (4)
                  S-Loss(θ) :=      exp b̂i                (2)
                                                                   where, λC0 is a hyper-parameter denoting the constant value
                                  i
                                                                   of λC after a sufficient number of epochs, αC is a hyper-
The use of exp function ensures that E-Loss is always posi-        parameter that dictates the rate of growth of the sigmoid
tive, even when predicted eigenvalues are negative (which is       function, and Ta is a hyper-parameter that controls the cut-
the case for all energy states, especially the ground-state). As   off number of epochs after which λC is activated from a cold
for the electromagnetic propagation application, we simply         start of 0.
direct the optimization towards the largest eigenvalue by re-
placing b̂i with − Re(b̂i ), where Re extracts the real part of    Overall Learning Objective: Combining all of the inno-
the complex eigenvalue. Since in both cases, the exp func-         vations described above in designing and incorporating PG
tion is being applied over negative quantities, S-Loss has         loss functions, we consider the following overall learning
smoothly varying gradients.                                        objective:

3.3   Adaptive tuning of PG loss weights:                             E(t) = Train-Loss + λC (t) C-Loss + λS (t) S-Loss
A simple strategy for incorporating PG loss terms in the           Note that Train-Loss is only computed over DT r , whereas
learning objective of fN N is to add them to Train-MSE us-         the PG loss terms, C-Loss and S-Loss, are computed over
ing trade-off weight parameters, λC and λS , for C-Loss and        DT r as well as the set of unlabeled samples, DU . We re-
S-Loss, respectively. Conventionally, such trade-off weights       fer to our proposed model trained using the above learn-
are kept constant to a certain value across all epochs of gradi-   ing objective as CoPhy-PGNN, which is an abbreviation for
ent descent. This inherently assumes that the importance of        Competing Physics PGNN.
PG loss terms in guiding the learning of fN N towards a gen-
eralizable solution is constant across all stages (or epochs)                       4     Evaluation setup
of gradient descent, and they are in agreement with each           Data in Quantum Physics: We considered n = 4 spin
other. However, in practice, we empirically find that C-Loss,      systems of Ising chain models for predicting their ground-
S-Loss, and Train-MSE compete with each other and have             state wave-function under varying influences of two control-
varying importance at different stages (or epochs) of ANN          ling parameters: Bx and Bz , which represent the strength
learning. Hence, we consider the following ways of adap-           of external magnetic field along the X axis (parallel to the
tively tuning the trade-off weights of C-Loss and S-Loss,          direction of Ising chain), and Z axis (perpendicular to the
λC and λS as a function of the epoch number t.                     direction of the Ising chain), respectively. The Hamiltonian
                                                                   matrix H for these systems is then given as:
Annealing λS : The first observation we make is that S-                          n−1
                                                                                 X                      n−1
                                                                                                        X                n−1
                                                                                                                         X
Loss plays a critical role in the initial stages of learning.            H=−            σiz σi+1
                                                                                             z
                                                                                                 − Bx         σix − Bz         σiz ,   (5)
Having a large value of λS in the beginning few epochs is                        i=0                    i=0              i=0
thus helpful to avoid the selection of local minima and in-
stead converge towards a generalizable solution. Hence, we            where σ x,y,z are Pauli operators and ring boundary con-
consider performing a simulated annealing of λS that takes         ditions are imposed. Note that the size of H is d = 2n = 16.
on a high value in the beginning epochs, that slowly decays        We set Bz to be equal to 0.01 to break the ground state de-
to 0 after sufficiently many epochs. Specifically, we consider     generacy, while Bx was sampled from a uniform distribution
the following annealing procedure for λS :                         from the interval [0, 2].
                                                                      Note that when Bx < 1, the system is said to be in a fer-
               λS (t) = λS0 × (1 − αS )bt/T e ,             (3)    romagnetic phase, since all the spins prefer to either point
                                                                   upward or downward collectively. However, when Bx > 1,
where, λS0 is a hyper-parameter denoting the starting value        the system transitions to paramagnetic phase, where both up-
of λS at epoch 0, αS < 1 is a hyper-parameter that controls        ward and downward spins are equally possible. Because the
the rate of annealing, and T is a scaling hyper-parameter.         ground-state wave-function behaves differently in the two
regions, the system actually exhibits different physical prop-      Models                       MSE (×102 )     Cosine Similarity
erties. Hence, in order to test for the generalizability of ANN     CoPhy-PGNN (proposed)        0.35 ± 0.12     99.50 ± 0.12%
                                                                    Black-box NN                  1.06 ± 0.16      95.32 ± 0.58%
models when training and test distributions are different, we
                                                                    PINN-analogue                 6.27 ± 6.94     87.37 ± 12.87%
generate training data only from the region deep inside the         PGNN-analogue                 0.91 ± 1.90      97.97 ± 4.89%
ferromagnetic phase for Bx < 0.5, while the test data is            MTL-PGNN                      6.33 ± 2.69      84.26 ± 6.33%
generated from a much wider range 0 < Bx < 2, covering              CoPhy-PGNN (only-DT r )       1.82 ± 0.36      93.61 ± 0.91%
both ferromagnetic and paramagnetic phases. In particular,          CoPhy-PGNN (w/o S-Loss)      10.97 ± 0.71      76.27 ± 0.80%
the training set comprises of N = 100, 000 points with Bx           CoPhy-PGNN (Label-free)       9.97 ± 4.42     63.97 ± 16.20%
uniformly sampled from 0 to 0.5, while the test set com-
prises of M = 20, 000 points with Bx uniformly sampled               Table 1: Test-MSE and Cosine Similarity of comparative ANN
from 0 to 2. For validation, we used sub-sampling on the             models on training size N = 1000 on the quantum physics
                                                                     application.
training set to obtain a validation set of 2000 samples. We
performed 10 random runs of uniform sampling over N , to
show the mean and variance of the performance metrics of              with constant weights. Note that the PG loss terms are
comparative ANN models, where at every run, a different               not defined as PDEs in our problem.
random initializtion of the ANN models is also used. Unless
                                                                   4. MTL-PGNN: Multi-task Learning (MTL) variant of
otherwise stated, the results in any experiment are presented
                                                                      PGNN where PG loss terms are optimized alternatively
over training size N = 2000.
                                                                      (Kang, Grauman, and Sha 2011) by randomly selecting
                                                                      one from all the loss terms for each mini-batch in every
Data in Electromagnetic Propagation: We considered a                  epoch.
periodically stratified layer stack of 10 layers of equal length      We also consider the following ablation models:
per period. The refractive index n of each layer was ran-
                                                                   1. CoPhy-PGNN (only-DT r ): This is an ablation model
domly assigned an integer value between 1 and 4. Hence, the
                                                                      where the PG loss terms are only trained over the training
permittivity  = n2 can take values from {1, 4, 9, 16}. Note
                                                                      set, DT r . Comparing our results with this model will help
that the majority of eigenvalue solvers rely on iterative al-
                                                                      in evaluating the importance of using unlabeled samples
gorithms and are therefore not easily deployable in GPU en-
                                                                      DU in the computation of PG loss.
vironments. To demonstrate the scalability of our approach
we generate N = 2000 realizations of the layered structure.        2. CoPhy-PGNN (w/o S-Loss): This is another ablation
                                                                      model where we only consider C-Loss in the learning ob-
For each example, we also generate the associated Â of size
                                                                      jective, while discarding S-Loss.
401 × 401 complex values, making the scale of this problem
about 2500 times larger than that of the quantum mecanics          3. CoPhy-PGNN (Label-free): This ablation model drops
problem. The combination of the challenging scale of this             Train-MSE from the learning objective and hence per-
eigen-decompostion and the scarcity of training data makes            forms label-free (LF) learning only using PG loss terms.
this problem interesting from scalability and generalizaility
perspective. To demonstrate extrapolation ability, we take a       Evaluation Metrics: We use two evaluation metrics: (a)
training size |DT r | = 370 realizations that has a refractive     Test MSE, and (b) Cosine Similarity between our predicted
index of only 1 in its first layer. On the other hand, we take a   eigenvector, ŷ, and the ground-truth, y, averaged across all
test set of size |DU | = 1630 with the first layer’s refractive    test samples. We particularly chose the cosine similarity
index unconstrained (i.e. any value from the set {1,2,3,4}).       for multiple reasons. First, Euclidean distances are not very
                                                                   meaningful in high-dimensional spaces of wave-functions,
                                                                   such as the ones we are considering in our analyses. Sec-
Baseline Methods: Since there does not exist any re-               ond, an ideal cosine similarity of 1 provides an intuitive
lated work in PGNN that has been explored for our tar-             baseline to evaluate goodness of results. But most impor-
get applications, we construct analogue versions of PINN-          tantly, in the electromagnetic propagation application, it is
analogue (Raissi, Perdikaris, and Karniadakis 2019) and            crucial to compare not just Fourier coefficients of the ex-
PGNN-analogue (Karpatne et al. 2017c) adapted to our               pansion (which is what the neural net produces) but rather
problem using their major features. We describe these base-        the actual profile of the magnetic field in the real space. The
lines along with others in the following:                          accuracy of this prediction can be tested by calculating the
1. Black-box NN (or NN): This refers to the “black-box”            overlap integral between the exact and the predicted profiles.
   ANN model trained just using Train-Loss without any PG          That integral, due to orthogonality of Fourier expansion, re-
   loss terms.                                                     duces to the cosine similarity. This facilitates testing whether
2. PGNN-analogue: The analogue version of PGNN                     our predicted vectors are valid eigenvectors from a physical
   (Karpatne et al. 2017c) for our problem where the hyper-        standpoint.
   parameters corresponding to S-Loss and C-Loss are set
   to a constant value.                                                           5    Results and analysis
3. PINN-analogue: The analogue version of PINN (Raissi,            5.1   Quantum Physics Application:
   Perdikaris, and Karniadakis 2019) for our problem that          Table 1 provides a summary of the comparison of CoPhy-
   performs label-free learning only using PG loss terms           PGNN with baseline methods on the quantum physics ap-
                                                                              board.
                                1.0
                                                                              Analysis of loss landscapes: We visualize the landscape

            Cosine Similarity
                                0.8
                                                                              of different loss functions w.r.t. ANN model parameters. In
                                0.6
                                                  NN
                                                                              particular, we use the code in (Bernardi 2019) to plot a 2D
                                0.4               CoPhy-PGNN (only-DT r )     view of the landscape of different loss functions, namely
                                0.2               CoPhy-PGNN                  Train-MSE, Test-MSE, and PG-Loss (sum of C-Loss and
                                                  CoPhy-PGNN (Label-free)     S-Loss), in the neighborhood of a model solution, as shown
                                0.0
                                      0.0   0.5     1.0       1.5       2.0   in Figure 2. The model’s parameters are treated with filter
                                                    Bx
                                                                              normalization as described in (Li et al. 2018), and hence, the
                                                                              coordinate values of the axes are unit-less. Also, the model
Figure 1: Cosine Similarity on test samples as a function of Bx .
                                                                              solutions are represented by blue dots. As can be seen, all
The dashed line represents the boundary between the interval used
for training (left) and testing (right).                                      label-aware models have found a minimum in Train-MSE
                                                                              landscape. However, when the test-MSE loss surface is plot-
                                                                              ted, it is clear that while the CoPhy-PGNN model is still at a
                                                                              minimum, the other baseline models are not. This is a strong
plication. We can see that our proposed model shows signif-                   indication that using the PG loss with unlabeled data can
icantly better performance in terms of both Test-MSE and                      lead to better extrapolation; it allows the model to general-
Cosine Similarity. In fact, the cosine similarity of our pro-                 ize beyond in-distribution data. We can see that without us-
posed model is almost 1, indicating almost perfect fit with                   ing labels, CoPhy-PGNN (Label-free) fails to reach a good
test labels. (Note that even a small drop in cosine similar-                  minimum of Test-MSE, even though it arrives at a minimum
ity can lead to cascading errors in the estimation of other                   of PG Loss.
physical properties derived from the ground-state wave-
function.) An interesting observation from Table 1 is that                    5.2   Electromagnetic Propagation Application:
CoPhy-PGNN (Label-free) actually performs even worse
than black-box NN. This shows that solely relying on PG                       For this application, the size of Â is 401 × 401, making it
loss without considering Train-MSE is fraught with chal-                      a daunting task for an eigensolver in terms of computation
lenges in arriving at a generalizable solution. Indeed, using                 time. As a result, a grid search hyper-parameter tuning of
a small number of labeled examples to compute Train-MSE                       ANN models is prohibitively expensive. This is due to the
provides a significant nudge to ANN learning to arrive at                     large number of epochs needed to optimize a model for a
more accurate solutions. Another interesting observation is                   problem of this scale. Nonetheless, we were still able to op-
that CoPhy-PGNN (only-DT r ) again performs even worse                        timize a model to do fairly well by manually adjusting the
than Black-box NN. This demonstrates that it is important                     hyper-parameters and architecture of CoPhy-PGNN to yield
to use unlabeled samples in DU , which are representative of                  acceptable results on the validation set. We emphasize, how-
the test set, to compute the PG loss. Furthermore, notice that                ever, that a more exhaustive tuning could lead to better re-
CoPhy-PGNN (w/o S-Loss) actually performs worst across                        sults that surpass the ones we obtained. Figure 3 shows that
all models, possibly due to the highly non-convex nature of                   CoPhy-PGNN is still able to better extrapolate than a Black-
C-Loss function that can easily lead to local minima when                     box NN on testing scenarios with permittivity greater than
used without S-Loss. This sheds light on another important                    1. In fact, we have observed that as Black-box NN solely
aspect of PGNN that is often over-looked, which is that it                    optimizes Train-MSE, its cosine similarity measure deteri-
does not suffice to simply add a PG-Loss term in the objec-                   orates on the test set. This is in contrast to CoPhy-PGNN’s
tive function in order to achieve generalizable solutions. In                 ability to maintain a cosine similarity close to 1 even though
fact, an improper use of PG Loss can result in worse perfor-                  its validation loss is comparable to Black-box NN’s.
mance than a black-box model.                                                    While training our model still takes a significant amount
                                                                              of time (about 12 hours), its effectiveness with respect to
Evaluating generalization power: Instead of computing                         testing speed is demonstrated in Table 2. We can see that
the average cosine similarity across all test samples, Figure                 our approach is at least an order of magnitude faster dur-
1 analyzes the trends in cosine similarity over test samples                  ing testing than any numerical eigensolver. This highlights
with different values of Bx , for four comparative models.                    the promise in using neural networks to solve physics-based
Note that none of these models have observed any labeled                      eigen-value problems, since, once trained, they can be used
data during training outside the interval of Bx ∈ [0, 0.5].                   to produce eigen-solutions on test points much faster than
Hence, by testing for the cosine similarity over test sam-                    numerical methods. Further, while CoPhy-PGNN shows
ples with Bx > 0.5, we are directly testing for the ability                   higher error than numerical solvers, note that the cosine sim-
of ANN models to generalize outside the data distributions                    ilarity of our model’s predictions with ground-truth is close
it has been trained upon. Evidently, all label-aware models                   to 0.8, thus admitting physical usability.
perform well on the interval of Bx ∈ [0, 0.5]. However,
except for CoPhy-PGNN, all baseline models degrade sig-                                6    Conclusions and future work
nificantly outside that interval, proving their lack of gener-                This work proposed novel strategies to address the problem
alizability. Moreover, the label-free, CoPhy-PGNN (Label-                     of competing physics loss functions in PGNN. For the gen-
free), model is highly erratic, and performs poorly across the                eral problem of solving eigen-value equations, we designed
                                                               NN / Train-MSE                                        CoPhy-PGNN (only-DT r ) / Train-MSE CoPhy-PGNN (Label-free) / Train-MSE                                                                                      CoPhy-PGNN / Train-MSE
                                                                                                  0.
                                                                                                  0 000                                                                      0..00.
                                                                                                                                                                          0 00.  0000                                                                     0 00.
                                                                                                                                                                                                                                                              0.002                                                            0.
                                                                                                                                                                                                                                                                                                                               0.00175
                                                                                                                                                                                                                                                                                                                             0.0
                                                               0                             0.00.0.0
                                                                                                   0 .0
                                                                                                      02302                                                0.001    0.000.0.0  0000222 2                                                                 0.0.0.0
                                                                                                                                                                                                                                                              2622677                                                      0.0
                                                                                                                                                                                                                                                                                                                       0. 0.001911213
                                                 2           00                              0.0 0 2                      2                                              1 01 1                      2                                                       26                   2                               0. 0.0000
                                                          0.                                     011                                                                                                                                                                                                                      67
                                                                                                                                                                                                                           0.025                                                                              0.003 005
                                                                                                                                                                                                                                        0.025                                                                 0.002
                                                 1                                                            0.02        1                                                                0.02      1                                                                    0.02    1                                                           0.02
                                                                                                                                                                                                                                  0.026


                                                                                                                                                                              0.00
                                                 0                                                                        0                                                                          0                            0.026                                           0


                                                                                                                                                                                   0
                                                                                                              0.01                                                                         0.01                                   26                                      0.01                                                                0.01
                                                                                                                                                                                                                           0.0                                                                            0.002
                                              −1                                                                       −1                                                                          −1                                      0.02
                                                                                                                                                                                                                                                7                                −1
                                                                                                                                                                                                                                                  27
                                                                                                                                                                                                                                              0.0 28                                                 0.003
                                                              0.000.001                                                                  0.                                                                                                        0.0 28                                         0.
                                                                                                                                                                                                                                                                                              0.00 005


                                                                                                                                                                                                          0.0
                                              −2 0.000..0030.00
                                                           0000    1                                                               000.
                                                                                                                       −2 0.000..000.00  1 1001
                                                                                                                                      0000              0.001                                      −2                                                0.0 .028                    −2     0.0.0
                                                                                                                                                                                                                                                                                          00907 6
                                                           0322 2                                                                  20222


                                                                                                                                                                                                             26
                                                                                                                                                                          0.001                                                                           0
                                                       −2                       0                    2                         −2                          0                    2                         −2                        0                         2                        −2                      0                    2
                                                                  NN / Test-MSE                                      CoPhy-PGNN (only-DT r ) / Test-MSE CoPhy-PGNN (Label-free) / Test-MSE                                                                                            CoPhy-PGNN / Test-MSE
                                                                                                 0.040                                                                    0.057                                                                           0.
                                                                                                                                                                                                                                                     0.10.1
                                                                                             0.030.503                                                                  050.0
                                                                                                                                                                      0.0.0
                                                                                                                                                                    0.04   05255
                                                                                                                                                                                                                                                          0.1
                                                                                                                                                                                                                                                            0.
                                                                                                                                                                                                                                                            1111                                                         0.0600.075
                                                                                                       7
                                                                                                                                                              0.00.                                                                                  0.100608102 4
                                                 2                                                                        2                                0.040 42045 8                             2                  0.100
                                                                                                                                                                                                                                                           4                      2                              0.03 0.045
                                                                                                              0.15                                      0.037                              0.15                                                                           0.15                                       0                        0.15
                                                                                                                                                                                                                                 0.104 0.102
                                                                                                                                                                                                                                 0.                                                                         0.015
                                                                                                                                                                                                                                 0. 106
                                                                                                                                                           0.035                                                                   1010
                                                                                                                                                                                                                                 0.1
                                                                                                                                                                                                                                   0.  8
                                                 1             0.033                                                      1                                                                          1                           0.
                                                                                                                                                                                                                                 0. 1111  2                                       1
                                                                                                                                                                                                                                 0.111
                                                                                                                                                                                                                                     18
                                                                                                                                                                                                                                   0.1 64
                                                                                                                                                                                                                                        20


                                                                                                                                                                                                                  0.1
                                                                                                                                                                                                              0.12
                                                                                                              0.10                                                                         0.10                   2         0.                                            0.10                                                                0.10


                                                                                                                                                                                                                     24
                                                                       0.0
                                                                                                                                                                                                                                 12


                                                                                                                                                   0.0
                                                 0                                                                        0                                                                          0                              6                8                            0
                                                                                  0.0
                                                                                                                                                                                                                                              0.12
                                                                        30


                                                                                                                                                       33
                                                                                                                                                 0.03                                                                           0.120                                                                  0.
                                                                                     28
                                                                   0.0                                                                               5                                                                                                                                                 0 015
                                                                       3
                                                                      0.05
                                                                                                                                                                                                                                  0.11
                                                                                                                                                                                                                                       8                                                              0.0.0430
                                                                                                                                                                                                                                                                                                       0. 06570
                                                                                                                                                                                                                                                                                                        0.0
                                              −1                     0.0 37
                                                                     0.   4                                   0.05     −1                   0.0
                                                                                                                                                37                                         0.05    −1                              0.118                                  0.05   −1                    0.0
                                                                                                                                                                                                                                                                                                     0 0.
                                                                                                                                                                                                                                                                                                       0.10 9055                              0.05
                                                                 0.00.00402                                                                                                                                                                                                                   0.1 0.15.1312   0
                                                             0.00.505408 45                                                              0.0 0.040
                                                                                                                                                                                                                                0.120                                                            65   0 5
                                              −2         0.
                                                      0.05 5
                                                          8
                                                            05
                                                                   2
                                                                                                                       −2 0.00.50400.08 45 42                                                      −2      0.11
                                                                                                                                                                                                               8                             0.122                               −2
                                                       −2                       0                    2                         −2                          0                    2                         −2                        0                         2                        −2                      0                    2
                                                                   NN / PG-loss                                       CoPhy-PGNN (only-DT r ) / PG-loss                                           CoPhy-PGNN (Label-free) / PG-loss                                                   CoPhy-PGNN / PG-loss
                                                                              0.183          0 00. 0.20
                                                                                                      92518                                                             0 0.
                                                                                                                                                                          0.119958
                                                                                                                                                                          0.1                                                                           00.0007
                                                                                                                                                                                                                                                              6.0065750                                                             0 03.15
                                                          86                                0.1.188.1919                                                           0.10.1.1
                                                                                                                                                                         8 8992                                                                  0.000.0.05.05.0                                                              0.100.1.1
                                                                                                                                                                                                                                                                                                                                     20 5 0
                                                 2     0.1                                        6           0.20        2                                 0.1770.18083 6                 0.20      2                  0.030                    0.045 05                 0.20    2                                   0.0 00..009             0.20
                                                                                 0.180
                                                                                                                                                                                                                                               0.03 0
                                                                                                                                                                                                                                                     4                                                            0      60 7505
                                                                                                                                                                                                                                                    5                                                          0.030.045
                                                                                                                                                                                                                                                                                                                0.0
                                                 1                                  0.1                       0.15        1                                                                0.15      1                                                                    0.15    1                                15
                                                                                                                                                                                                                                                                                                                                              0.15
                                                                                       77
                                                                                                                                                           0.1
                                                                                                                                                              71
                                                 0                                                                        0                                                                          0                                                                            0
                                                                                                              0.10                                                                         0.10                                                                           0.10                                                                0.10


                                                                                                                                                                                  74
                                                                                                                                                                                                                                                                                                            0.01


                                                                                                                                                                               0.1
                                                                                                                                                                                                                                                                                                                 5
                                              −1                                                                       −1                                                                          −1                                                                            −1                      0 0.030
                                                                                                              0.05                                                                         0.05              0.0 0.040
                                                                                                                                                                                                                          0.0
                                                                                                                                                                                                                             35
                                                                                                                                                                                                                                                                          0.05                       0.06 .045                                0.05
                                                                                                                                       0.1                                                                      4                                                                             0   0.00.
                                                                                                                                                                                                                                                                                                     9  0750
                                                               0.18            0.1                                                           8
                                                                                                                                         0.18 0 0.17                                                                                                                                            .1     0
                                              −2 0.200..200.20
                                                          190.
                                                             1919
                                                          70418
                                                                529          0.18683                                   −2 0.10.0.0.190.19
                                                                                                                                     1818
                                                                                                                                        89529
                                                                                                                                              63    7                                              −2 0.005.0550 5                                                               −2   0.10.
                                                                                                                                                                                                                                                                                            0.
                                                                                                                                                                                                                                                                                         0.615
                                                                                                                                                                                                                                                                                            5
                                                                                                                                                                                                                                                                                               505
                                                                                                                                                                                                                                                                                            1312
                                                                                                                                                                                                                                                                                               0
                                                                                                                                                                                                                                                                                                   0


                                                       −2                       0                    2                         −2                          0                    2                         −2                        0                         2                        −2                      0                    2


 Figure 2: A comprehensive comparison between CoPhy-PGNN and different baseline models. The 1st and 2nd columns show
 that without using unlabeled data, the model does not generalize well. On the other hand, the 3rd column shows that without
 labeled data, the model fails to reach a good minimum. Only the last column, our proposed model, shows a good fit across both
 labeled and unlabeled data. The best performing model is also the model that best optimizes the PG loss.


                                                                                                                                                                                                          a PGNN model CoPhy-PGNN and demonstrated its efficacy
                                                                                                                                                                                                          in two target applications in quantum mechanics and elec-
                                    0.8                                                                                                                                                                   tromagnetic propagation. From our results, we found that: 1)
                                                                                                                                                                                                          PG loss helps to extrapolate and gives the model better gen-
               Cosine Similarity


                                    0.6
                                                                                                              NN
                                                                                                                                                                                                          eralizablity; and 2) Using labeled data along with PG loss
                                    0.4
                                                                                                              CoPhy-PGNN                                                                                  results in more stable PGNN models. Moreover, we visual-
                                    0.2                                                                                                                                                                   ized the loss landscape to give a better understanding of how
                                    0.0                                                                                                                                                                   the combination of both labeled data loss and PG loss leads
                                   −0.2
                                                                                                                                                                                                          to better generalization performance. We have also demon-
                                          1                    4              9                                                      16                                                                   strated the generalizability of our CoPhy-PGNN to multiple
                                                                Permittivity of First Layer
                                                                                                                                                                                                          application domains with varying types of physics loss func-
 Figure 3: Cosine similarity of CoPhy-PGNN compared to Black-                                                                                                                                             tions, as well as its scalability to large systems. Future work
 box NN for the electromagnetic propagation application. The                                                                                                                                              can focus on reducing the training time of our model so as to
 dashed line represents the boundary between the interval used for                                                                                                                                        perform extensive hyper-parameter tuning to reach a better
 training (left) and testing (right).                                                                                                                                                                     global minima. Finally, while this work empirically demon-
                                                                                                                                                                                                          strated the value of CoPhy-PGNN in combating with com-
                                                                                                                                                                                                          peting PG loss terms, future work can focus on theoretical
                                                                                                                                                                                                          analyses of our approach.

   Solver                                                                      average time (seconds)                               average |Ây − by|                                                                                                                                   References
   CoPhy-PGNN                                                                  0.0430                                               1.878 × 102
   numpy.linalg.eig                                                            93.743                                               7.714 × 10−6                                                          Appenzeller, T. 2017. The scientists’ apprentice. Science
   Matlab                                                                      0.196                                                8.747 × 10−12                                                         357(6346): 16–17.
   torch.eig                                                                   16.565                                               6.821 × 10−13
   scipy.linalg.eig                                                            106.223                                              7.538 × 10−4                                                          Bernardi, M. D. 2019. loss-landscapes. URL https://github.
   scipy.sparse.linalg.eigs                                                    8.893                                                4.418 × 10−3
                                                                                                                                                                                                          com/marcellodebernardi/loss-landscapes/.
Table 2: Comparison of speed and accuracy between CoPhy-PGNN                                                                                                                                              Caruana, R. 1993. Multitask Learning: A Knowledge-Based
and other numerical eigensolvers. Note that Matlab calculates the                                                                                                                                         Source of Inductive Bias. In Proceedings of the Tenth In-
eigenvalue of interest (i.e. the largest), while the other eigensolvers,                                                                                                                                  ternational Conference on International Conference on Ma-
except for our proposed method, calculate all the eigenvalues of the                                                                                                                                      chine Learning, ICML’93, 41–48. San Francisco, CA, USA:
given matrix. This explains why Matlab has relatively faster execution                                                                                                                                    Morgan Kaufmann Publishers Inc. ISBN 1558603077.
time.
                                                                                                                                                                                                          Daw, A.; Thomas, R. Q.; Carey, C. C.; Read, J. S.; Appling,
                                                                                                                                                                                                          A. P.; and Karpatne, A. 2020. Physics-Guided Architecture
(PGA) of Neural Networks for Quantifying Uncertainty in           Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2019.
Lake Temperature Modeling. In Proceedings of the 2020             Physics-informed neural networks: A deep learning frame-
SIAM International Conference on Data Mining, 532–540.            work for solving forward and inverse problems involving
SIAM.                                                             nonlinear partial differential equations. Journal of Compu-
                                                                  tational Physics 378: 686–707.
de Bezenac, E.; Pajot, A.; and Gallinari, P. 2019. Deep
learning for physical processes: Incorporating prior scien-       Shin, Y.; Darbon, J.; and Karniadakis, G. E. 2020. On the
tific knowledge. Journal of Statistical Mechanics: Theory         convergence and generalization of physics informed neural
and Experiment 2019(12): 124009.                                  networks. arXiv preprint arXiv:2004.01806 .
Graham-Rowe, D.; Goldston, D.; Doctorow, C.; Waldrop,             Stewart, R.; and Ermon, S. 2017. Label-free supervision of
M.; Lynch, C.; Frankel, F.; Reid, R.; Nelson, S.; Howe, D.;       neural networks with physics and domain knowledge. In
and Rhee, S. 2008. Big data: science in the petabyte era.         AAAI.
Nature 455(7209): 8–9.                                            Wang, J.-X.; Wu, J.; Ling, J.; Iaccarino, G.; and Xiao, H.
                                                                  2017. A Comprehensive Physics-Informed Machine Learn-
Jia, X.; Willard, J.; Karpatne, A.; Read, J.; Zwart, J.; Stein-
                                                                  ing Framework for Predictive Turbulence Modeling. arXiv
bach, M.; and Kumar, V. 2019. Physics Guided RNNs for
                                                                  preprint arXiv:1701.07102 .
Modeling Dynamical Systems: A Case Study in Simulat-
ing Lake Temperature Profiles. In Proceedings of the 2019         Wang, J.-X.; Wu, J.-L.; and Xiao, H. 2016. Physics-
SIAM International Conference on Data Mining, 558–566.            Informed Machine Learning for Predictive Turbulence Mod-
SIAM.                                                             eling: Using Data to Improve RANS Modeled Reynolds
                                                                  Stresses. arXiv preprint arXiv:1606.07987 .
Kang, Z.; Grauman, K.; and Sha, F. 2011. Learning with
Whom to Share in Multi-Task Feature Learning. In Proceed-         Wang, J.-X.; Wu, J.-L.; and Xiao, H. 2017. Physics-
ings of the 28th International Conference on International        informed machine learning approach for reconstructing
Conference on Machine Learning, ICML’11, 521–528.                 Reynolds stress modeling discrepancies based on DNS data.
Madison, WI, USA: Omnipress. ISBN 9781450306195.                  Physical Review Fluids 2(3): 034603.
Karpatne, A.; Atluri, G.; Faghmous, J. H.; Steinbach, M.;         Wang, S.; Teng, Y.; and Perdikaris, P. 2020. Understand-
Banerjee, A.; Ganguly, A.; Shekhar, S.; Samatova, N.; and         ing and mitigating gradient pathologies in physics-informed
Kumar, V. 2017a. Theory-guided data science: A new                neural networks. arXiv preprint arXiv:2001.04536 .
paradigm for scientific discovery from data. IEEE Trans-          Willard, J.; Jia, X.; Xu, S.; Steinbach, M.; and Kumar, V.
actions on Knowledge and Data Engineering 29(10): 2318–           2020. Integrating physics-based modeling with machine
2331.                                                             learning: A survey. arXiv preprint arXiv:2003.04919 .
Karpatne, A.; Atluri, G.; Faghmous, J. H.; Steinbach, M.;
Banerjee, A.; Ganguly, A.; Shekhar, S.; Samatova, N.; and
Kumar, V. 2017b. Theory-guided data science: A new
paradigm for scientific discovery from data. IEEE Trans-
actions on Knowledge and Data Engineering 29(10): 2318–
2331.
Karpatne, A.; Watkins, W.; Read, J.; and Kumar, V. 2017c.
Physics-guided Neural Networks (PGNN): An Applica-
tion in Lake Temperature Modeling.         arXiv preprint
arXiv:1710.11431 .
Li, H.; Xu, Z.; Taylor, G.; Studer, C.; and Goldstein, T. 2018.
Visualizing the Loss Landscape of Neural Nets. In Bengio,
S.; Wallach, H.; Larochelle, H.; Grauman, K.; Cesa-Bianchi,
N.; and Garnett, R., eds., Advances in Neural Information
Processing Systems 31, 6389–6399. Curran Associates, Inc.
URL http://papers.nips.cc/paper/7875-visualizing-the-loss-
landscape-of-neural-nets.pdf.
Raissi, M.; Perdikaris, P.; and Karniadakis, G. 2017a.
Physics Informed Deep Learning (Part I): Data-driven So-
lutions of Nonlinear Partial Differential Equations. arXiv
preprint arXiv:1711.10561 .
Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2017b.
Physics Informed Deep Learning (Part II): Data-driven Dis-
covery of Nonlinear Partial Differential Equations. arXiv
preprint arXiv:1711.10566 .