Combining Programmable Potentials and Neural Networks for Materials
                                  Problems
    Ryan Mohr,1 Allan Avila,1 Soham Gosh, 2 Ananta Bhattarai, 2 Muqiao Yang, 2 Xintian Feng, 3
       Martin Head-Gordon, 3, 4 Ruslan Salakhutdinov, 2 Maria Fonoberova,1 Igor Mezić,1, 5
                                             1
                                                   AIMdyn, Inc., Santa Barbara, CA 93101
                                        2
                                              Carnegie Mellon University, Pittsburgh, PA 15213
                                                   3
                                                     Q-Chem Inc., Pleasanton, CA 94588
                                               4
                                                 University of California, Berkeley, CA 94720
                                            5
                                              University of California, Santa Barbara, CA 93106
          mohrr@aimdyn.com, allan@aimdyn.com, sohamg@cs.cmu.edu, abhattar@cs.cmu.edu, muqiaoy@cs.cmu.edu,
 xintian@q-chem.com, mhg@cchem.berkeley.edu, rsalakhu@cs.cmu.edu, mfonoberova@aimdyn.com, mezici@aimdyn.com
                               Abstract                                    the nanosecond time scale requires the sequential evalua-
                                                                           tion of roughly a million energy and force evaluations, each
    Force field methods can be used in molecular dynamics simu-
                                                                           of which requires about 100 seconds using 8 cores. This
    lations to compute and infer macro-level properties of chem-
    ical or material systems. However, these force fields can of-          equates to approximately 0.1 billion seconds for a simula-
    ten predict erroneous values due to not being accurate at the          tion, or over 3 years.
    quantum level. The ideal would be able to simulate the sys-               Our approach constructs a generative model by taking
    tem from first principles (quantum-level) and scale up to get          classical molecular dynamics potentials (such as Lennard-
    the material properties. However, this is computationally in-          Jones potentials) which capture the correct long-range be-
    feasible. In this paper, we develop a Programmable Potentials          havior and combines them with small neural networks which
    methodology that utilizes high-fidelity QM/MM data sets to             encode the quantum level logic and correct the near-range
    inform a molecular dynamics (MD) potential. This method-
                                                                           behavior. The encoding functions multiplicatively modify
    ology uses encoding functions containing small neural net-
    works that automatically learn the quantum-level interaction           the classical molecular potentials. Our approach is termed
    logic of the system from the data and uses these encoding              Programmable Potentials with neural networks (PP-NN).
    functions to modify standard MD potentials, like Lennard-                 The formulation of Programmable Potentials (Thakur,
    Jones, which capture the correct long range behavior, but do           Mohr, and Mezić 2016) prior to this work relied on the re-
    poorly in the near range. We test the method on the adsorp-            searcher to manually determine and specify the interaction
    tion of hydrocarbons in a zeolite framework. We show that the          logic and encode this in a boolean algebra. This was a time
    encoding functions, and the resulting model, generalize well           consuming process and possible only for simple reactions.
    outside their training set and that encoding functions trained
    with data from a simpler hydrocarbon such as CH4 can build
                                                                           The idea we present here is to learn this logic automat-
    good models for more complex hydrocarbons such as C2 H6                ically by leveraging neural networks and tunable proxim-
    and C3 H8 .                                                            ity functions, thereby replacing the “hand-tuned” encoding
                                                                           functions. An additional advantage of using neural networks
                                                                           as the encoding functions, and implementing them in pack-
                           Introduction                                    ages such as TensorFlow or PyTorch, allows one to leverage
 We seek to use data from small quantum mechanical simu-                   the auto-differentiation routines of those frameworks, so that
lations to build generative models at the molecular scale in               moving from training to deployment in a molecular dynam-
order to predict bulk material properties. This is not achiev-             ics simulation bypasses an error-prone, manual differentia-
able with current methods. Modeling and solving a system                   tion step to get the force fields.
on the level of the Schrodinger equation scales exponen-                      We applied our methods to the materials science prob-
tially with the number of electrons in the system, making                  lem of hydrocarbons binding in a zeolite framework, both
it infeasible for any system of reasonable size. Density func-             undoped (MFI) and doped (HMFI). Doped zeolites are
tional theory and periodicity assumptions on the system (so-               nano-porous doped silica catalysts that are the world’s most
called QM/MM methods) mitigate this problem somewhat,                      widely used catalytic framework for cracking, dehydrogena-
but are still infeasible for many systems (Jensen 2007). For               tion, and oligomerization of hydrocarbons (Primo and Gar-
example, using QM/MM methods for direct simulation of                      cia 2014). The classic example of doping is replacement
molecular dynamics (MD) for zeolite-catalyzed reactions on                 of a small fraction Si by Al and an accompanying proton
                                                                           to create a solid acid catalyst (Corma 1995). An illustra-
Copyright © 2021, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.                          tion is shown in Figure 1. Doping a zeolite turns a non-
      Copyright ©2021 for this paper by its authors. Use permit-           functional material into a functional material, where cata-
ted under Creative Commons License Attribution 4.0 International           lyst function is controlled by short-range electronic effects
(CC BY 4.0)                                                                from doping, as well as shape-selective effects (pore size and
                                                                    an encoding function that encodes the interaction logic of
                                                                    the system. In bond-breaking reaction systems, the encoding
                                                                    functions are usually restricted to have values between 0 and
                                                                    1 and act as a switch that turns individual bonds on or off. In
                                                                    static systems, like binding of guest molecules such as CH4
                                                                    in a host zeolite framework, the encoding function values are
                                                                    not squashed in this way.
                                                                       In original formulation of Programmable Potentials
                                                                    (Thakur, Mohr, and Mezić 2016), the coarse-level logic of
                                                                    the system was encapsulated in the encoding functions. This
                                                                    logic was first constructed using a Boolean algebra gener-
                                                                    ated by indicator functions on intervals [0, α). Once the logic
                                                                    was constructed, the indicator functions were replaced with
                                                                    smooth variants (see Eq.(2)) whose parameters were then
                                                                    optimized over a training data set. This process could be
                                                                    quite complicated as the system moved past very simple re-
                                                                    actions and would require a lot of man-hours and ingenu-
                                                                    ity. The formation in this paper replaces these hand-tuned
Figure 1: An illustration of the 3-d pore structure of the ze-      encoding functions with encoding functions utilizing small
olite, MFI, showing 437 T sites (Si atoms). Doping is the           neural networks to automatically learn the interaction logic.
replacement of one or more Si atoms by other elements such          The neural networks used were densely connected networks
as Al, Sn, etc, or, as illustrated here, extra-framework atoms      with, generally, no more 8 layers with neurons ranging be-
that attach to the side of pores, often at sites that are already   tween 8 and 64 neurons for a layer.
doped. The properties of the doped site are fundamentally              For reactions between a small number of atoms, we can
different than Si, and, in favorable cases, make functional         apply an encoding function for each interaction. However,
materials with unique properties, controlled by the combi-          for the zeolite problem, using an encoding function for ev-
nation of short-range electronic and steric/shape effects, and      ery interaction is infeasible. This would require optimiz-
long-range electrostatic and dispersion interactions.               ing thousands of neural networks simultaneously due to the
                                                                    number of atoms in the zeolite framework. Therefore, we
                                                                    learned an encoding function for each unique interaction-
geometry) (Smit and Maesen 2008), and long-range effects
                                                                    type and limited them to zeolite atoms close to the hydrocar-
(remote charges, and polarization) from the extended frame-
                                                                    bon. For example, every silicon-carbon interaction used the
work (Mansoor et al. 2018).
                                                                    same encoding function. This reduced the number of encod-
                                                                    ing functions that needed to be trained from thousands to 4
                          Methods                                   (for the undoped zeolite). This also enabled transfer learning
Pure force field methods often do not capture a potential           to other systems by learning a “universal” encoding function
field correctly. While giving the correct form at long range,       for each interaction type.
the short range energies can be incorrect. Ideally, one would          For each unique atom pair type interaction, we use one
like to use a full quantum mechanics simulation, but this           neural network to learn the corresponding encoding func-
is intractable except for small systems. QM/MM hybrid ap-           tion. For example, in the undoped zeolite problem, we have
proaches are introduced to relax these limitations. As men-         four separate neural networks in total, encoding SSiC , SSiH ,
tioned above however, these methods still suffer from com-          SOC , SOH respectively. The networks take the pairwise dis-
putation intractability if one would like to simulate them          tance as input, and outputs the modified Lennard-Jones po-
over a reasonable time scale, which would be needed to ex-          tential. Recall that the modification for a pairwise potential
tract macroscale properties of the material.                        only happens for atoms in the QM region. For those atoms
   Programmable Potentials is a force field methods that            outside QM region, we keep the original LJ pairwise poten-
takes into account the reaction logic of the system, quantum        tial for the energy calculation. The overall process is sum-
or otherwise, given a data set. Since it is a force field method,   marized in Algorithm 1.
this makes it amenable for molecular dynamics simulations.             An additional advantage of using neural networks as the
Using small, high-fidelity QM/MM data sets gives a force            encoding functions, and implementing them in frameworks
field model that can be evaluated outside its training set, set-    such as TensorFlow or PyTorch, allows one leverage the
ting the stage for long time-scale simulations that are still       auto-differentiation methods of those framework so that
informed by quantum level information.                              moving from training to deployment in a molecular dynam-
   The Programmable Potential model takes the form of               ics simulation bypasses an error-prone, manual differentia-
                            XX                                      tion step to get the force fields.
                   EP P =            Vjk Sjk                  (1)      Finally, the choice of the functional form (1), rather than
                              j   k>j                               a pure neural network, gives a generative model. Neural net-
where Vkj is a pairwise potential between atoms j and k,            works interpolate well, but often have difficulty extrapolat-
such as the Lennard-Jones potential in Eq. (3), and Sjk is          ing outside the data sets they were trained on. Our methods
Algorithm 1 Programmable Potentials with Neural Net-
works
  /* Add pairwise potentials for atoms in QM region */
  for atom j in hydrocarbon do
      for zeolite atom k in QM region do
          Epp ← Epp + Vkj (rkj ) · Skj (rkj )
      end for
  end for
                                                                        Figure 3: Basic encoding function architecture.
  /* Modify pairwise potentials for atoms outside QM re-
  gion */
  for atom j in hydrocarbon do
      for zeolite atom k not in QM region do                      manually specified interaction logic. The molecular dynam-
          Epp ← Epp + Vkj (rkj )                                  ics (MD) baseline potential that we are utilizing is a 12-6
      end for                                                     Lennard-Jones (LJ) potentials, shown below in equation (3)
  end for
  return Epp
                                                                                              "             12                     6 #
                                                                                  √                Ri + Rj                  Ri + Rj
                                                                   VLJ (rij ) =       i j                        −2                        ,
                                                                                                     rij                      rij
                              ,=1
               1                                                                                                            (3)
                                                 n=1              where rij is the interatomic distance between atom i and j,
              0.8                                n=4              √
                                                                     i j is the minimum of the potential energy and Ri + Rj
                                                 n=9
                                                                  yield the distance at which the potential is minimal.
   h1;n (r)


              0.6                                n = 16
                                                                     The baseline LJ potential is modified with encoding func-
              0.4
                                                                  tions according to unique atom pair types. Therefore, we
              0.2                                                 only develop encoding functions for Si-C, Si-H, O-C, O-H,
                                                                  Al-C, and Al-H interactions. We have selected an encoding
               0
                    0   0.5     1          1.5            2       function which only modifies the potentials of atoms whose
                                r                                 interatomic distance is within a cut off radius. Furthermore,
                                                                  the encoding function depends only the interatomic distance
 Figure 2: Proximity function with cutoff distance α = 1.         between the zeolite and methane atom of interest. For ex-
                                                                  ample the LJ potential between a carbon and a zeolite atom
                                                                  will be modified by an encoding function that depends only
extrapolate well outside the training set.                        on the interatomic distance between the carbon and zeolite
  The encoding functions Sjk are structured as follows.           atom. The analytic form of the resulting “hand-tuned” po-
Pairwise distances between atoms are the inputs. For each         tential for a Si-C and O-H interaction below.
pairwise distance input, a proximity function is formed. The
proximity function has the form                                            V (rSiC ) = VLJ (rSiC ) ∗ S(rSiC )
                                     1                                               = VLJ (rSiC ) ∗ (1 − h(rSiC ))
                  h(r; α, n) =               ,            (2)                                                                            (4)
                                 1 + ( αr )n                              V (rOH1 ) = VLJ (rOH1 ) ∗ S(rOH1 )
where the parameter α controls the cutoff distance and the                               = VLJ (rOH1 ) ∗ (1 − h(rOH1 )),
parameter n controls how fast the cutoff is. These are both       where in the above equations h(r; α, n) is the standard prox-
tunable parameters allowing the model to determine the cor-       imity function that is used within the programmable poten-
rect scales during optimization. Figure 2 shows proximity         tials methodology.
functions at different parameter values n with a fixed cutoff        As mentioned above, the programmable potential seeks
distance α = 1. The proximity functions and their negations       to modify the LJ potentials of zeolite atoms that fall within
(1 − h) are then fed into a neural network which combines         a cut off radius of the hydrocarbon. This procedure essen-
them to learn the interaction logic from the data set. Figure 3   tially divides the entire system into two parts, where one is
sketches the basic architecture of the encoding function. We      governed by the molecular dynamics interactions and the
call this architecture Programmable Potentials with Neural        other is governed by the quantum mechanic interactions.
Networks (PP-NN). We compare this new architecture with           However, the QM/MM data provided by Q-Chem was gen-
the baseline model consisting of pure Lennard-Jones inter-        erated in a slightly different manner. Specifically, the QM
actions.                                                          region was defined by Q-chem to be composed of the clos-
   In addition to comparing the new PP-NN architecture with       est silicon atom to the hydrocarbon along with eight addi-
the baseline LJ model, we also compare this with a variety        tional neighbors of that silicon atom (4 oxygen and 4 sili-
of Programmable Potentials with manually specified encod-         con atoms). Therefore, the Q-Chem data was generated via
ing functions. As an example, consider developing a pro-          a connectivity based QM region whereas our (hand-tuned)
grammable potential for CH4 in the zeolite framework with         programmable potentials were generated via a radial cut off
QM region. The neural network programmable potential that
we have developed is also a connectivity based potential.
Therefore, we have also focused on developing connectiv-
ity based programmable potentials with hand-tuned logic
to serve as additional baseline for the neural network pro-
grammable potentials.
   We list the models tested for the undoped zeolite problem
below.
 • Case 1: Pure Lennard-Jones (LJ) potential using Q-
   Chem provided LJ parameters. No optimization. This case
   serves as the baseline.
 • Case 2: Pure Lennard-Jones potential with optimization
   of LJ parameters for carbon-hydrogen interactions.
 • Case 3: Radial distance based programmable potential us-
   ing Q-chem provided LJ parameters and optimization of        Figure 4: CH4 +MFI train set error. The T1 data was used for
   carbon-hydrogen encoding functions.                          training the models. Case 1 is the pure LJ baseline. Cases
                                                                5 and NN (case 7) are the most directly comparable. Both
 • Case 4: Radial distance based programmable potentials
                                                                use connectivity-based definitions of the QM region and nei-
   with optimization of the LJ and encoding functions for
                                                                ther optimizes any LJ parameters. The main difference is in
   carbon-hydrogen.
                                                                whether they use hand-tuned logic (case 5) or use neural net-
 • Case 5: Connectivity based programmable potential us-        works (NN).
   ing Q-chem provided LJ parameters and optimization of
   carbon-hydrogen encoding functions.
 • Case 6: Connectivity based programmable potentials with      a Programmable Potentials model trained directly on the
   optimization of the LJ parameters and encoding functions     C2 H6 +MFI T1 symmetry data (see Figure 6). We also tested
   for C/H interactions.                                        transferring the CH4 encoding functions to C3 H8 (Figure 7).
 • Case 7 (NN in plots): Connectivity based programmable        One lesson learned here was that the capacity of the neu-
   potential with neural network using Q-chem provided LJ       ral network needed to match the complexity of the prob-
   parameters. This architecture is the focus of this paper.    lem we wanted to transfer the learning to. For example, the
                                                                first approach used fully-connected neural networks having
                         Results                                4 layers with 8 neurons each. While it adequately learned
                                                                the CH4 +MFI data (with good generalizations to the other
We found that our models generalize well outside the train-
                                                                symmetries), it did not do a good job of transferring to the
ing set. For example, undoped MFI has 12 chemically dis-
                                                                C2 H6 +MFI data sets. Increasing the capacity of the net-
tinguishable silicon atoms, called symmetry regions and la-
                                                                work (5 layers with 16, 32, 64, 32, 16 neurons respectively)
beled T1 to T12. For all models, only the T1 symmetry data
                                                                allowed good transferability. The conjecture is that the net-
was used for training. The 11 other symmetry regions were
                                                                work needs to have enough capacity to match the complexity
used for testing. Despite being trained on less than 10% of
                                                                of the quantum logic of the problem to be transfered to. Cur-
the data, the model beat a pure Lennard-Jones baseline on
                                                                rently, this is done in an ad-hoc manner. Having more for-
the T1 symmetry and also generalized well to the other sym-
                                                                malized or precise heuristics or methods to determine such
metry regions. We compare the results with the true data us-
                                                                a match requires additional effort and could be a topic for a
ing the mean relative error
                                                                future research.
                        n
                     1X         |Ti − Fi |                         Finally, we tested transferring encoding functions from
          M RE =                                 ,        (5)   the undoped MFI framework to the doped HMFI frame-
                     n i=1 maxi (T ) − mini (T )
                                                                work, a more difficult problem (Figure 8). Doped zeolite is
where Ti is the true energy of the system (from high-fidelity   formed by replacing a small fraction of silicon atoms with
QM/MM computations) at sample location i and Fi is the          aluminum atoms. These aluminum atoms present a chal-
energy of the model at sample location i.                       lenge to transferring the encoding functions. This is due
   We also tested how well our trained models generalized       to the undoped MFI data sets not containing any infor-
to other hydrocarbons. For example, we trained encoding         mation about how aluminum reacts with the hydrocarbons.
functions using the CH4 , undoped zeolite (MFI), T1 sym-        The introduction of aluminum can be considered a black
metry data. These trained encoding functions were then used     swan event from the viewpoint of the encoding functions.
to build a model for C2 H6 in MFI. This new model beat a        Despite this, the encoding functions generalized well from
pure Lennard-Jones potential for C2 H6 +MFI on almost all       CH4 +MFI (methane in undoped zeolite) to CH4 +HMFI
the symmetry regions. With a small amount of retraining of      (methane in doped zeolite). Again the encoding functions
the CH4 -learned encoding functions using C2 H6 +MFI T1         were only trained using CH4 +MFI T1 symmetry data and
symmetry data, the model beat a pure Lennard-Jones poten-       generalized well to the other symmetry regions of CH4 in
tial on all symmetry regions and was comparable or beat         doped zeolite (of which there are only 10 instead of 12).
                                                                 Figure 6: CH4 +MFI generalization to C2 H6 +MFI. Base-
                                                                 line (blue) is a pure LJ potential on the C2 H6 +MFI data set.
                                                                 CH4 without adaptation (orange) uses the encoding func-
                                                                 tions trained using the CH4 +MFI T1 data set to build a Pro-
Figure 5: CH4 +MFI average test set error. The models were       grammable Potential for C2 H6 +MFI. CH4 with adaptation
trained on the T1 data set and tested on the T2–T12 symme-       on C2 H6 (green) is the same of the previous model except
try data. The MRE on the data sets for each test symmetry        now a small amount of training epochs using the T1 sym-
region was computed and then averaged. Cases 5 and NN            metry data set of C2 H6 +MFI is used to update the encod-
(case 7) are the most directly comparable.                       ing functions. The final model (red) uses encoding functions
                                                                 trained from scratch on the C2 H6 +MFI T1 data set without
                                                                 leveraging on information from CH4 +MFI.
                      Conclusions
We have presented the Programmable Potentials with Neu-
ral Networks (PP-NN) methodology in the context adsorb-
ing hydrocarbons in zeolite frameworks. The methodol-
ogy builds generative models out of classical molecular dy-
namics potentials—such as Lennard-Jones potentials, which
give the correct long-range behavior—and small, targeted
neural networks that learn the quantum level logic between
atoms if given QM/MM data sets or other quantum mechan-
ical data sets.
   The learned encoding functions can be trained on sim-
pler hydrocarbons and transmitted to more complex hydro-
carbons and beat the baseline Lennard-Jones potential. Us-       Figure 7: CH4 +MFI generalization to C3 H8 +MFI (un-
ing the previous trained encoding functions as initial condi-    doped to doped zeolite). The same as in Figure 6 except all
tions for the more complex hydrocarbons, gives better per-       of the C2 H6 +MFI data sets were replaced with C3 H8 +MFI
formance than an encoding function that was randomly ini-        data sets.
tialized and trained on the more complex hydrocarbons data
set. This ability to transfer seems to require a higher capac-
ity network than is needed to just learn the potential for the   once trained, the PP-NN has an execution time on the order
simpler hydrocarbon.                                             of classical force field methods, molecular dynamics simula-
   Looking forward, the computational catalysis community        tions having quantum-level accuracy could be simulated in
stands to benefit from the development of efficient, accu-       hours or days rather than years for a comparable QM/MM
rate and transferable machine-learning architectures such as     simulation.
we have developed in this program. This community has
high standards for acceptable accuracy, because significant
errors in the predicted energies enter observable quantities
                                                                                   Acknowledgements
such as relative populations of binding sites, turnover fre-     This work was supported under DARPA contract HR0011-
quencies, etc., in the exponential of Boltzmann factors or       19-9-0026.
Arrenhius rate constants. Small errors in the relative binding
energies can lead to erroneous predictions for gas adsorp-                              References
tion, and likewise can even lead to erroneous predictions
of reaction mechanisms. The goal in these communities is         Corma, A. 1995. Inorganic Solid Acids and Their Use in
to have ML learned force fields that have the accuracy of        Acid-Catalyzed Hydrocarbon Reactions. Chem. Rev. 95(3):
                                                                 559–614.
quantum methods, but at the cost of force field methods. The
Programmable Potentials neural network (PP-NN) method-           Jensen, F. 2007. Introduction to Computational Chemistry.
ology is a promising initial step towards this goal. Since,      John Wiley & Sons, second edition.
Figure 8: CH4 +MFI generalization to CH4 +HMFI. The
encoding functions trained using CH4 +MFI (methane in
undoped zeolite) T1 symmetry data is used to build Pro-
grammable Potential models for CH4 +HMFI (methane in
doped zeolite). Baseline (blue) is a pure LJ potential. CH4
without adaptation uses the already trained encoding func-
tions and does not do do any retraining. CH4 with adap-
tation uses the trained encoding functions and trains for a
small number of epochs using the CH4 +HMFI T1 symme-
try data.x


Mansoor, E.; Van der Mynsbrugge, J.; Head-Gordon, M.; and
Bell, A. T. 2018. Impact of long-range electrostatic and dis-
persive interactions on theoretical predictions of adsorption
and catalysis in zeolites. Catal. Today 312(SI): 51–65.
Primo, A.; and Garcia, H. 2014. Zeolites as catalysts in oil
refining. Chem. Soc. Rev. 43(22): 7548–7561.
Smit, B.; and Maesen, T. L. M. 2008. Towards a molecular
understanding of shape selectivity. Nature 451(7179): 671–
678.
Thakur, G. S.; Mohr, R.; and Mezić, I. 2016. Programmable
Potentials: Approximate N-body potentials from coarse-level
logic. Scientific Reports 6(1): 33415.