Combining Programmable Potentials and Neural Networks for Materials Problems Ryan Mohr,1 Allan Avila,1 Soham Gosh, 2 Ananta Bhattarai, 2 Muqiao Yang, 2 Xintian Feng, 3 Martin Head-Gordon, 3, 4 Ruslan Salakhutdinov, 2 Maria Fonoberova,1 Igor Mezić,1, 5 1 AIMdyn, Inc., Santa Barbara, CA 93101 2 Carnegie Mellon University, Pittsburgh, PA 15213 3 Q-Chem Inc., Pleasanton, CA 94588 4 University of California, Berkeley, CA 94720 5 University of California, Santa Barbara, CA 93106 mohrr@aimdyn.com, allan@aimdyn.com, sohamg@cs.cmu.edu, abhattar@cs.cmu.edu, muqiaoy@cs.cmu.edu, xintian@q-chem.com, mhg@cchem.berkeley.edu, rsalakhu@cs.cmu.edu, mfonoberova@aimdyn.com, mezici@aimdyn.com Abstract the nanosecond time scale requires the sequential evalua- tion of roughly a million energy and force evaluations, each Force field methods can be used in molecular dynamics simu- of which requires about 100 seconds using 8 cores. This lations to compute and infer macro-level properties of chem- ical or material systems. However, these force fields can of- equates to approximately 0.1 billion seconds for a simula- ten predict erroneous values due to not being accurate at the tion, or over 3 years. quantum level. The ideal would be able to simulate the sys- Our approach constructs a generative model by taking tem from first principles (quantum-level) and scale up to get classical molecular dynamics potentials (such as Lennard- the material properties. However, this is computationally in- Jones potentials) which capture the correct long-range be- feasible. In this paper, we develop a Programmable Potentials havior and combines them with small neural networks which methodology that utilizes high-fidelity QM/MM data sets to encode the quantum level logic and correct the near-range inform a molecular dynamics (MD) potential. This method- behavior. The encoding functions multiplicatively modify ology uses encoding functions containing small neural net- works that automatically learn the quantum-level interaction the classical molecular potentials. Our approach is termed logic of the system from the data and uses these encoding Programmable Potentials with neural networks (PP-NN). functions to modify standard MD potentials, like Lennard- The formulation of Programmable Potentials (Thakur, Jones, which capture the correct long range behavior, but do Mohr, and Mezić 2016) prior to this work relied on the re- poorly in the near range. We test the method on the adsorp- searcher to manually determine and specify the interaction tion of hydrocarbons in a zeolite framework. We show that the logic and encode this in a boolean algebra. This was a time encoding functions, and the resulting model, generalize well consuming process and possible only for simple reactions. outside their training set and that encoding functions trained with data from a simpler hydrocarbon such as CH4 can build The idea we present here is to learn this logic automat- good models for more complex hydrocarbons such as C2 H6 ically by leveraging neural networks and tunable proxim- and C3 H8 . ity functions, thereby replacing the “hand-tuned” encoding functions. An additional advantage of using neural networks as the encoding functions, and implementing them in pack- Introduction ages such as TensorFlow or PyTorch, allows one to leverage We seek to use data from small quantum mechanical simu- the auto-differentiation routines of those frameworks, so that lations to build generative models at the molecular scale in moving from training to deployment in a molecular dynam- order to predict bulk material properties. This is not achiev- ics simulation bypasses an error-prone, manual differentia- able with current methods. Modeling and solving a system tion step to get the force fields. on the level of the Schrodinger equation scales exponen- We applied our methods to the materials science prob- tially with the number of electrons in the system, making lem of hydrocarbons binding in a zeolite framework, both it infeasible for any system of reasonable size. Density func- undoped (MFI) and doped (HMFI). Doped zeolites are tional theory and periodicity assumptions on the system (so- nano-porous doped silica catalysts that are the world’s most called QM/MM methods) mitigate this problem somewhat, widely used catalytic framework for cracking, dehydrogena- but are still infeasible for many systems (Jensen 2007). For tion, and oligomerization of hydrocarbons (Primo and Gar- example, using QM/MM methods for direct simulation of cia 2014). The classic example of doping is replacement molecular dynamics (MD) for zeolite-catalyzed reactions on of a small fraction Si by Al and an accompanying proton to create a solid acid catalyst (Corma 1995). An illustra- Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. tion is shown in Figure 1. Doping a zeolite turns a non- Copyright ©2021 for this paper by its authors. Use permit- functional material into a functional material, where cata- ted under Creative Commons License Attribution 4.0 International lyst function is controlled by short-range electronic effects (CC BY 4.0) from doping, as well as shape-selective effects (pore size and an encoding function that encodes the interaction logic of the system. In bond-breaking reaction systems, the encoding functions are usually restricted to have values between 0 and 1 and act as a switch that turns individual bonds on or off. In static systems, like binding of guest molecules such as CH4 in a host zeolite framework, the encoding function values are not squashed in this way. In original formulation of Programmable Potentials (Thakur, Mohr, and Mezić 2016), the coarse-level logic of the system was encapsulated in the encoding functions. This logic was first constructed using a Boolean algebra gener- ated by indicator functions on intervals [0, α). Once the logic was constructed, the indicator functions were replaced with smooth variants (see Eq.(2)) whose parameters were then optimized over a training data set. This process could be quite complicated as the system moved past very simple re- actions and would require a lot of man-hours and ingenu- ity. The formation in this paper replaces these hand-tuned Figure 1: An illustration of the 3-d pore structure of the ze- encoding functions with encoding functions utilizing small olite, MFI, showing 437 T sites (Si atoms). Doping is the neural networks to automatically learn the interaction logic. replacement of one or more Si atoms by other elements such The neural networks used were densely connected networks as Al, Sn, etc, or, as illustrated here, extra-framework atoms with, generally, no more 8 layers with neurons ranging be- that attach to the side of pores, often at sites that are already tween 8 and 64 neurons for a layer. doped. The properties of the doped site are fundamentally For reactions between a small number of atoms, we can different than Si, and, in favorable cases, make functional apply an encoding function for each interaction. However, materials with unique properties, controlled by the combi- for the zeolite problem, using an encoding function for ev- nation of short-range electronic and steric/shape effects, and ery interaction is infeasible. This would require optimiz- long-range electrostatic and dispersion interactions. ing thousands of neural networks simultaneously due to the number of atoms in the zeolite framework. Therefore, we learned an encoding function for each unique interaction- geometry) (Smit and Maesen 2008), and long-range effects type and limited them to zeolite atoms close to the hydrocar- (remote charges, and polarization) from the extended frame- bon. For example, every silicon-carbon interaction used the work (Mansoor et al. 2018). same encoding function. This reduced the number of encod- ing functions that needed to be trained from thousands to 4 Methods (for the undoped zeolite). This also enabled transfer learning Pure force field methods often do not capture a potential to other systems by learning a “universal” encoding function field correctly. While giving the correct form at long range, for each interaction type. the short range energies can be incorrect. Ideally, one would For each unique atom pair type interaction, we use one like to use a full quantum mechanics simulation, but this neural network to learn the corresponding encoding func- is intractable except for small systems. QM/MM hybrid ap- tion. For example, in the undoped zeolite problem, we have proaches are introduced to relax these limitations. As men- four separate neural networks in total, encoding SSiC , SSiH , tioned above however, these methods still suffer from com- SOC , SOH respectively. The networks take the pairwise dis- putation intractability if one would like to simulate them tance as input, and outputs the modified Lennard-Jones po- over a reasonable time scale, which would be needed to ex- tential. Recall that the modification for a pairwise potential tract macroscale properties of the material. only happens for atoms in the QM region. For those atoms Programmable Potentials is a force field methods that outside QM region, we keep the original LJ pairwise poten- takes into account the reaction logic of the system, quantum tial for the energy calculation. The overall process is sum- or otherwise, given a data set. Since it is a force field method, marized in Algorithm 1. this makes it amenable for molecular dynamics simulations. An additional advantage of using neural networks as the Using small, high-fidelity QM/MM data sets gives a force encoding functions, and implementing them in frameworks field model that can be evaluated outside its training set, set- such as TensorFlow or PyTorch, allows one leverage the ting the stage for long time-scale simulations that are still auto-differentiation methods of those framework so that informed by quantum level information. moving from training to deployment in a molecular dynam- The Programmable Potential model takes the form of ics simulation bypasses an error-prone, manual differentia- XX tion step to get the force fields. EP P = Vjk Sjk (1) Finally, the choice of the functional form (1), rather than j k>j a pure neural network, gives a generative model. Neural net- where Vkj is a pairwise potential between atoms j and k, works interpolate well, but often have difficulty extrapolat- such as the Lennard-Jones potential in Eq. (3), and Sjk is ing outside the data sets they were trained on. Our methods Algorithm 1 Programmable Potentials with Neural Net- works /* Add pairwise potentials for atoms in QM region */ for atom j in hydrocarbon do for zeolite atom k in QM region do Epp ← Epp + Vkj (rkj ) · Skj (rkj ) end for end for Figure 3: Basic encoding function architecture. /* Modify pairwise potentials for atoms outside QM re- gion */ for atom j in hydrocarbon do for zeolite atom k not in QM region do manually specified interaction logic. The molecular dynam- Epp ← Epp + Vkj (rkj ) ics (MD) baseline potential that we are utilizing is a 12-6 end for Lennard-Jones (LJ) potentials, shown below in equation (3) end for return Epp " 12  6 # √ Ri + Rj Ri + Rj VLJ (rij ) = i j −2 , rij rij ,=1 1 (3) n=1 where rij is the interatomic distance between atom i and j, 0.8 n=4 √ i j is the minimum of the potential energy and Ri + Rj n=9 yield the distance at which the potential is minimal. h1;n (r) 0.6 n = 16 The baseline LJ potential is modified with encoding func- 0.4 tions according to unique atom pair types. Therefore, we 0.2 only develop encoding functions for Si-C, Si-H, O-C, O-H, Al-C, and Al-H interactions. We have selected an encoding 0 0 0.5 1 1.5 2 function which only modifies the potentials of atoms whose r interatomic distance is within a cut off radius. Furthermore, the encoding function depends only the interatomic distance Figure 2: Proximity function with cutoff distance α = 1. between the zeolite and methane atom of interest. For ex- ample the LJ potential between a carbon and a zeolite atom will be modified by an encoding function that depends only extrapolate well outside the training set. on the interatomic distance between the carbon and zeolite The encoding functions Sjk are structured as follows. atom. The analytic form of the resulting “hand-tuned” po- Pairwise distances between atoms are the inputs. For each tential for a Si-C and O-H interaction below. pairwise distance input, a proximity function is formed. The proximity function has the form V (rSiC ) = VLJ (rSiC ) ∗ S(rSiC ) 1 = VLJ (rSiC ) ∗ (1 − h(rSiC )) h(r; α, n) = , (2) (4) 1 + ( αr )n V (rOH1 ) = VLJ (rOH1 ) ∗ S(rOH1 ) where the parameter α controls the cutoff distance and the = VLJ (rOH1 ) ∗ (1 − h(rOH1 )), parameter n controls how fast the cutoff is. These are both where in the above equations h(r; α, n) is the standard prox- tunable parameters allowing the model to determine the cor- imity function that is used within the programmable poten- rect scales during optimization. Figure 2 shows proximity tials methodology. functions at different parameter values n with a fixed cutoff As mentioned above, the programmable potential seeks distance α = 1. The proximity functions and their negations to modify the LJ potentials of zeolite atoms that fall within (1 − h) are then fed into a neural network which combines a cut off radius of the hydrocarbon. This procedure essen- them to learn the interaction logic from the data set. Figure 3 tially divides the entire system into two parts, where one is sketches the basic architecture of the encoding function. We governed by the molecular dynamics interactions and the call this architecture Programmable Potentials with Neural other is governed by the quantum mechanic interactions. Networks (PP-NN). We compare this new architecture with However, the QM/MM data provided by Q-Chem was gen- the baseline model consisting of pure Lennard-Jones inter- erated in a slightly different manner. Specifically, the QM actions. region was defined by Q-chem to be composed of the clos- In addition to comparing the new PP-NN architecture with est silicon atom to the hydrocarbon along with eight addi- the baseline LJ model, we also compare this with a variety tional neighbors of that silicon atom (4 oxygen and 4 sili- of Programmable Potentials with manually specified encod- con atoms). Therefore, the Q-Chem data was generated via ing functions. As an example, consider developing a pro- a connectivity based QM region whereas our (hand-tuned) grammable potential for CH4 in the zeolite framework with programmable potentials were generated via a radial cut off QM region. The neural network programmable potential that we have developed is also a connectivity based potential. Therefore, we have also focused on developing connectiv- ity based programmable potentials with hand-tuned logic to serve as additional baseline for the neural network pro- grammable potentials. We list the models tested for the undoped zeolite problem below. • Case 1: Pure Lennard-Jones (LJ) potential using Q- Chem provided LJ parameters. No optimization. This case serves as the baseline. • Case 2: Pure Lennard-Jones potential with optimization of LJ parameters for carbon-hydrogen interactions. • Case 3: Radial distance based programmable potential us- ing Q-chem provided LJ parameters and optimization of Figure 4: CH4 +MFI train set error. The T1 data was used for carbon-hydrogen encoding functions. training the models. Case 1 is the pure LJ baseline. Cases 5 and NN (case 7) are the most directly comparable. Both • Case 4: Radial distance based programmable potentials use connectivity-based definitions of the QM region and nei- with optimization of the LJ and encoding functions for ther optimizes any LJ parameters. The main difference is in carbon-hydrogen. whether they use hand-tuned logic (case 5) or use neural net- • Case 5: Connectivity based programmable potential us- works (NN). ing Q-chem provided LJ parameters and optimization of carbon-hydrogen encoding functions. • Case 6: Connectivity based programmable potentials with a Programmable Potentials model trained directly on the optimization of the LJ parameters and encoding functions C2 H6 +MFI T1 symmetry data (see Figure 6). We also tested for C/H interactions. transferring the CH4 encoding functions to C3 H8 (Figure 7). • Case 7 (NN in plots): Connectivity based programmable One lesson learned here was that the capacity of the neu- potential with neural network using Q-chem provided LJ ral network needed to match the complexity of the prob- parameters. This architecture is the focus of this paper. lem we wanted to transfer the learning to. For example, the first approach used fully-connected neural networks having Results 4 layers with 8 neurons each. While it adequately learned the CH4 +MFI data (with good generalizations to the other We found that our models generalize well outside the train- symmetries), it did not do a good job of transferring to the ing set. For example, undoped MFI has 12 chemically dis- C2 H6 +MFI data sets. Increasing the capacity of the net- tinguishable silicon atoms, called symmetry regions and la- work (5 layers with 16, 32, 64, 32, 16 neurons respectively) beled T1 to T12. For all models, only the T1 symmetry data allowed good transferability. The conjecture is that the net- was used for training. The 11 other symmetry regions were work needs to have enough capacity to match the complexity used for testing. Despite being trained on less than 10% of of the quantum logic of the problem to be transfered to. Cur- the data, the model beat a pure Lennard-Jones baseline on rently, this is done in an ad-hoc manner. Having more for- the T1 symmetry and also generalized well to the other sym- malized or precise heuristics or methods to determine such metry regions. We compare the results with the true data us- a match requires additional effort and could be a topic for a ing the mean relative error future research. n 1X |Ti − Fi | Finally, we tested transferring encoding functions from M RE = , (5) the undoped MFI framework to the doped HMFI frame- n i=1 maxi (T ) − mini (T ) work, a more difficult problem (Figure 8). Doped zeolite is where Ti is the true energy of the system (from high-fidelity formed by replacing a small fraction of silicon atoms with QM/MM computations) at sample location i and Fi is the aluminum atoms. These aluminum atoms present a chal- energy of the model at sample location i. lenge to transferring the encoding functions. This is due We also tested how well our trained models generalized to the undoped MFI data sets not containing any infor- to other hydrocarbons. For example, we trained encoding mation about how aluminum reacts with the hydrocarbons. functions using the CH4 , undoped zeolite (MFI), T1 sym- The introduction of aluminum can be considered a black metry data. These trained encoding functions were then used swan event from the viewpoint of the encoding functions. to build a model for C2 H6 in MFI. This new model beat a Despite this, the encoding functions generalized well from pure Lennard-Jones potential for C2 H6 +MFI on almost all CH4 +MFI (methane in undoped zeolite) to CH4 +HMFI the symmetry regions. With a small amount of retraining of (methane in doped zeolite). Again the encoding functions the CH4 -learned encoding functions using C2 H6 +MFI T1 were only trained using CH4 +MFI T1 symmetry data and symmetry data, the model beat a pure Lennard-Jones poten- generalized well to the other symmetry regions of CH4 in tial on all symmetry regions and was comparable or beat doped zeolite (of which there are only 10 instead of 12). Figure 6: CH4 +MFI generalization to C2 H6 +MFI. Base- line (blue) is a pure LJ potential on the C2 H6 +MFI data set. CH4 without adaptation (orange) uses the encoding func- tions trained using the CH4 +MFI T1 data set to build a Pro- Figure 5: CH4 +MFI average test set error. The models were grammable Potential for C2 H6 +MFI. CH4 with adaptation trained on the T1 data set and tested on the T2–T12 symme- on C2 H6 (green) is the same of the previous model except try data. The MRE on the data sets for each test symmetry now a small amount of training epochs using the T1 sym- region was computed and then averaged. Cases 5 and NN metry data set of C2 H6 +MFI is used to update the encod- (case 7) are the most directly comparable. ing functions. The final model (red) uses encoding functions trained from scratch on the C2 H6 +MFI T1 data set without leveraging on information from CH4 +MFI. Conclusions We have presented the Programmable Potentials with Neu- ral Networks (PP-NN) methodology in the context adsorb- ing hydrocarbons in zeolite frameworks. The methodol- ogy builds generative models out of classical molecular dy- namics potentials—such as Lennard-Jones potentials, which give the correct long-range behavior—and small, targeted neural networks that learn the quantum level logic between atoms if given QM/MM data sets or other quantum mechan- ical data sets. The learned encoding functions can be trained on sim- pler hydrocarbons and transmitted to more complex hydro- carbons and beat the baseline Lennard-Jones potential. Us- Figure 7: CH4 +MFI generalization to C3 H8 +MFI (un- ing the previous trained encoding functions as initial condi- doped to doped zeolite). The same as in Figure 6 except all tions for the more complex hydrocarbons, gives better per- of the C2 H6 +MFI data sets were replaced with C3 H8 +MFI formance than an encoding function that was randomly ini- data sets. tialized and trained on the more complex hydrocarbons data set. This ability to transfer seems to require a higher capac- ity network than is needed to just learn the potential for the once trained, the PP-NN has an execution time on the order simpler hydrocarbon. of classical force field methods, molecular dynamics simula- Looking forward, the computational catalysis community tions having quantum-level accuracy could be simulated in stands to benefit from the development of efficient, accu- hours or days rather than years for a comparable QM/MM rate and transferable machine-learning architectures such as simulation. we have developed in this program. This community has high standards for acceptable accuracy, because significant errors in the predicted energies enter observable quantities Acknowledgements such as relative populations of binding sites, turnover fre- This work was supported under DARPA contract HR0011- quencies, etc., in the exponential of Boltzmann factors or 19-9-0026. Arrenhius rate constants. Small errors in the relative binding energies can lead to erroneous predictions for gas adsorp- References tion, and likewise can even lead to erroneous predictions of reaction mechanisms. The goal in these communities is Corma, A. 1995. Inorganic Solid Acids and Their Use in to have ML learned force fields that have the accuracy of Acid-Catalyzed Hydrocarbon Reactions. Chem. Rev. 95(3): 559–614. quantum methods, but at the cost of force field methods. The Programmable Potentials neural network (PP-NN) method- Jensen, F. 2007. Introduction to Computational Chemistry. ology is a promising initial step towards this goal. Since, John Wiley & Sons, second edition. Figure 8: CH4 +MFI generalization to CH4 +HMFI. The encoding functions trained using CH4 +MFI (methane in undoped zeolite) T1 symmetry data is used to build Pro- grammable Potential models for CH4 +HMFI (methane in doped zeolite). Baseline (blue) is a pure LJ potential. CH4 without adaptation uses the already trained encoding func- tions and does not do do any retraining. CH4 with adap- tation uses the trained encoding functions and trains for a small number of epochs using the CH4 +HMFI T1 symme- try data.x Mansoor, E.; Van der Mynsbrugge, J.; Head-Gordon, M.; and Bell, A. T. 2018. Impact of long-range electrostatic and dis- persive interactions on theoretical predictions of adsorption and catalysis in zeolites. Catal. Today 312(SI): 51–65. Primo, A.; and Garcia, H. 2014. Zeolites as catalysts in oil refining. Chem. Soc. Rev. 43(22): 7548–7561. Smit, B.; and Maesen, T. L. M. 2008. Towards a molecular understanding of shape selectivity. Nature 451(7179): 671– 678. Thakur, G. S.; Mohr, R.; and Mezić, I. 2016. Programmable Potentials: Approximate N-body potentials from coarse-level logic. Scientific Reports 6(1): 33415.