AI Research Associate for Early-Stage Scientific Discovery Morad Behandish1∗ , John T. Maxwell III1 , Johan de Kleer1 1 Palo Alto Research Center (PARC) 3333 Coyote Hill Road, Palo Alto, California 94304 (www.parc.com) Abstract built-in ontological biases in most machine learning (ML) frameworks prevent them from thinking outside the box to Artificial intelligence (AI) has been increasingly applied in discover not only the known-unknowns, but also unknown- scientific activities for decades; however, it is still far from unknowns, during early stages of the scientific process. an insightful and trustworthy collaborator in the scientific process. Most existing AI methods are either too simplis- tic to be useful in real problems faced by scientists or too Contributions domain-specialized (even dogmatized), stifling transforma- We present ‘cyber-physicist’ (CyPhy), our novel AI re- tive discoveries or paradigm shifts. We present an AI research search associate for early-stage scientific process of hypoth- associate for early-stage scientific discovery based on (a) a esis generation and initial (in)validation, grounded in the novel minimally-biased ontology for physics-based modeling most invariable mathematical foundations of classical and that is context-aware, interpretable, and generalizable across relativistic physics. Our framework distinguishes itself from classical and relativistic physics; (b) automatic search for existing rule-based reasoning, statistical learning, and hy- viable and parsimonious hypotheses, represented at a high- level (via domain-agnostic constructs) with built-in invari- brid AI methods by: ants, e.g., postulated forms of conservation principles implied (1) an ability to rapidly enumerate and test a diverse set by a presupposed spacetime topology; and (c) automatic com- of mathematically sound and parsimonious physical hy- pilation of the enumerated hypotheses to domain-specific, in- potheses, starting from a few basic assumptions on the terpretable, and trainable/testable tensor-based computation embedding spacetime topology; graphs to learn phenomenological relations, e.g., constitutive or material laws, from sparse (and possibly noisy) data sets. (2) a distinction between non-negotiable mathematical truism (e.g., conservation laws or symmetries), that are directly implied by properties of spacetime, and phenomenolog- Introduction ical relations (e.g., constitutive laws), whose characteri- Data-driven AI methods have been applied extensively in the zation relies indisputably on empirical observation, jus- past few decades to distill nontrivial physics-based insights tifying targeted use of data-driven methods (e.g., ML or (scientific discovery) and to predict complex dynamical be- polynomial regression); and havior (scientific simulation) (Stevens et al. 2020). Notwith- (3) a “simple-first” strategy (following Occam’s razor) to standing their effectiveness and efficiency in classification, search for new hypotheses by incrementally introducing regression, and forecasting tasks, statistical learning meth- latent variables that are expected to exist based on topo- ods can hardly ever evaluate the soundness of a function fit, logical foundations of physics. explain the reasons behind observed correlations, or provide sufficiently strong guarantees to replace parsimonious and Background explainable scientific expressions such as differential equa- AI-assisted discovery of scientific knowledge has been an tions (DE). Hybrid methods such as constructing “physics- active area of research (Langley 1998) long before the rise informed/inspired/guided” architectures for neural nets and of GPU-accelerated deep learning (DL). As computational loss functions that penalize both predication and DE resid- power and data sources are becoming more ubiquitous, ual errors (Raissi, Perdikaris, and Karniadakis 2019; Wei model-based, data-driven, and hybrid AI methods are play- and Chen 2019; Daw et al. 2020) and graph-nets based on ing an increasingly more important role in various scientific control theory and combinatorial structures (Cranmer et al. activities (Kitano 2016; Raghu and Schmidt 2020). 2019; Seo and Liu 2019; Sanchez-Gonzalez et al. 2020) Related efforts to our approach to scientific hypothesis are all important steps towards explainability; however, the generation and evaluation are mostly engineered after how ∗ Corresponding Author, e-mail: moradbeh@parc.com. humans approach scientific discovery, including sequential Copyright c 2021 for this paper by its authors. Use permitted un- rule-based symbolic regression (Schmidt and Lipson 2009; der Creative Commons License Attribution 4.0 International (CC Udrescu and Tegmark 2020), latent space representation BY 4.0) learning via deep neural net auto-encoders (Iten et al. 2020; Nautrup et al. 2020) and strategic combinations of divide- embedding spacetime using well-established concepts from and-conquer, unsupervised learning, simplification by pe- cellular homology (Hatcher 2001) and exterior calculus of nalizing description lengths in the loss function, and a pos- differential and discrete forms (Bott and Tu 1982; Hirani teriori unification by clustering (Wu and Tegmark 2019). 2003) that are under-utilized in AI. While these and other efforts have shown great promise for elevating AI to the role of an autonomous, creative, and in- Topological Foundations of Physics sightful collaborator that can offer human scientists a set of The key enabler of our AI framework is a simple type sys- viable options to consider, their applications have remained tem for (a) physical variables, based on how they are mea- limited to rather basic examples. sured in spacetime; and (b) physical relations, based on their On the more domain-specialized end, DL has been widely (topological vs. metric) nature, and the variables they con- successful in classification, regression, and forecasting tasks nect. Following the ground-breaking discoveries by a num- in scientific areas as diverse as turbulence (Miyanawala and ber of mathematicians, physicists, and electrical engineers Jaiman 2017; Wang et al. 2020), chaotic particle dynam- (Kron 1963; Roth 1955; Branin 1966) towards a general ics (Breen et al. 2020), molecular chemistry and materials network theory, Tonti explained the fascinating analogies science (Butler et al. 2018), and protein engineering (Yang, across classical and relativistic physics in his pioneering life- Wu, and Arnold 2019), among others. Most specialized DL long work (Tonti 2013) by reframing them in the language architectures are ad hoc, designed (by humans) using nar- of cellular homology, leading to informal classification dia- row, domain-specific, and (by construction) biased knowl- grams. Tonti diagrams can be formalized as directed graphs edge and expertise, stifling innovation and surprise. More- with strongly typed nodes for variables and edges for rela- over, DL models that successfully capture nontrivial patterns tions. The variable are typed as (d1 , d2 )−forms based on in data are often difficult to explain, lack guarantees even their measurement on d1 − and d2 −dimensional submani- within their training space, and poorly extrapolate to out-of- folds (d1 − and d2 −cells) of space and time, respectively. training scenarios (Mehta et al. 2019). Training such models For instance, to model heat transfer in (3+1)D spacetime, for high-dimensional physics problems requires enormous temperature is typed as a (0, 1)−form because it is mea- data, which is either unavailable or too costly to obtain in sured at spatial points (0−cells) and during temporal inter- many experimental sciences. vals (1−cells), whereas heat flux is typed as a (2, 1)−form because it is measured over spatial surfaces (2−cells) and The Cyber-Physicist during temporal intervals (1−cells). In classical calculus, We introduce an AI tool that can bridge multiple levels of ab- both of these variables reduce to scalar and vector fields, straction, using a domain-agnostic representation scheme to probed at spatial points and at temporal instants, to write express a wide range of mathematically viable physical hy- down compact pointwise DEs; however, keeping track of potheses by exploiting common structural invariants across the topological and geometric character of DEs is key to a physics. Our approach entails: deeper understanding of how known physical theories work, (a) defining a relatively unbiased ontology rooted in funda- and building on top of it for AI-assisted discovery of new mental abstractions that are common to all known theories physics grounded in mathematical foundations. of classical and relativistic physics; The spatiotemporal cells (or embedding manifolds) are (b) constructing a constrained search space to enumerate vi- further classified as primary or secondary, endowed with in- able hypotheses with postulated invariants, e.g., built-in ner or outer orientations, respectively, depending on how the conservation laws that are consistent with the presup- variables change sign in a hypothetical reversal of spacetime posed spacetime topology; and orientation (Mattiussi 2000). The cells are related by topo- logical duality (Fig. 1 (a)). For example, an inner-oriented (c) automatically assembling interpretable ML architectures curve (1−cell, σ 1 ) sitting in primary space, along which for each hypothesis, to estimate parameters for phe- temperature variations are measured, is dual to an outer- nomenological relations from empirical data. oriented surface (2−cell, σe2 ) sitting in secondary space, over At the core of (a) is a powerful mathematical abstraction which heat flux is measured, and the two cells are spatially of physical governing equations rooted in algebraic topol- registered and consistently oriented, if we embed them in ogy and differential geometry (Frankel 2011). This abstrac- two co-located “copies” of 3D space. tion leads to an ontological commitment to the relationship The relations among variables on the Tonti diagrams are between physical measurement and basic properties of the typed based on the pairs of variables they relate, as well as embedding spacetime—but nothing more, to leave room for the nature of the relation itself: innovation and surprise. This relationship has been shown to be responsible for the analogies and common structure • Topological relations map spatiotemporal forms to forms across physics (Tonti 2013), exploited in (b), along with of one higher dimension in space or time via incidence search heuristics based on analogical reasoning. Each vi- relations, and are responsible for propagation of informa- able hypothesis is automatically compiled to an interpretable tion in spacetime through incident cells. “computation graph”—tensor-based architecture, akin to a • Metric relations locally map forms defined over dual cells neural net with convolution layers to compute differenti- to one another based on phenomenological properties, ation/integration and (non)linear local operators for con- spatial lengths, and temporal durations, and are respon- stitutive equations—for a given cellular decomposition of sible for local distortion of information. Figure 1: A topology-aware representation for physics (Tonti 2013): (a) variables associated with spatial and temporal cells of various dimensions give rise to primary forms and secondary forms (also called pseudo-forms); (b) resulting in 32 possible types for spatio-temporal forms, and an underlying structure for fundamental theories of physics. Figure 2: Tonti diagrams are recipes to generate governing equations in different contexts, defined by a continuum, discrete, or semi-discrete setting and a topological embedding of the variables based on how they are measured. The conservation laws in terms of co-boundary operators result directly from assumed properties of space (or spacetime), while constitutive relations must be learned from data (e.g., via regression/ML). • Algebraic relations are in-place, i.e., map a given form An Ontology for Scientific Process to another from of the same type, and can be used to We present a novel representation, called ‘interaction net- capture initial/boundary conditions, external source/sink works’ (I-nets), based on a generalization of Tonti dia- terms, or cross-physics couplings between variables of the grams that is expressive and versatile enough to accom- same type on different diagrams. modate novel scientific hypotheses, while retaining a basic commitment to philosophical principles such as parsimony The relations are drawn in Fig. 1 as vertical arrows, horizon- (Occam’s razor), measurement-driven classification of vari- tal (or horizontal-diagonal) arrows, and loops (1−cycles), ables, and separation of non-negotiable mathematical prop- respectively. The interpretation of these relations to sym- erties of spacetime (homology) from domain-specific empir- bolic or numerical operations depends on the choice of a ical knowledge (phenomenology). Data science is employed cellular decomposition of spacetime on which they operate. to help only with the latter. For example, using a continuum spacetime with infinitesi- • We conceptualize three levels of abstraction related by in- mal cells, the variables are viewed as differential forms and heritance: abstract (symbolic) I-nets → discrete (cellular) the topological operators on them are interpreted as exte- I-nets → numerical (tensor-based) I-nets. rior derivatives (Bott and Tu 1982). In elementary calculus, these operators give rise to gradient, curl, and divergence • At each level, an I-net instance is contextualized by user- in space and partial derivative in time in terms of scalar defined assumptions on spacetime topology, semantics of and vector fields that are proxy to these forms, leading to physical quantities, and structural restrictions on allow- partial DEs (PDEs). In a discrete (or semi-discrete) setting, able diagrams based on analogical reasoning and domain- on the other hand, the same diagram can be used to pro- specific insight (if available). duce integral (or integro-differential) equations that capture • Every I-net instance distinguishes between topological the same fundamental conservation and constitutive reali- and metric operators; however, it has additional degrees ties, where the variables are viewed as co-chains, also called of freedom (beyond Tonti diagrams) for the latter to allow discrete forms (or mixed forms, e.g., discrete in space, dif- for phenomenological relations among variables that may ferential in time, or vice versa) and the topological oper- not be dual to each other. ators become co-boundary operators that are fundamental in cellular homology (Hatcher 2001). For example, using a The latter is motivated by the observation that some exist- semi-discretization in space with integral quantities associ- ing middle-ground theories use phenomenological relations ated with 0−, 1−, 2− and 3−cells on a pair of staggered to capture a combination of topological and metric aspects. unstructured meshes in 3D, while keeping time as a contin- We define an abstract (symbolic) I-net on a single uum, the semi-discrete form of the heat equation as a system D−space as a finite collection of primary and/or secondary of ordinary DEs (ODEs) (Fig. 2 (a)). Upon discretization of co-chain complexes that are inter-connected by phenomeno- time, one obtains algebraic equations that can be solved or logical links, as shown in Fig. 4 (a). Each co-chain complex parameter estimated via tensor-based ML. is a sequence of (symbolic) d−forms related by (symbolic) co-boundary operators from d−forms to (d + 1)−forms It is important to note that 3D meshes in space and 1D (0 ≤ d ≤ D). The interpretation of d → (d + 1) maps de- time-stepping are not the only ways to provide a combi- pends on the embedding dimension D; for instance, if D = 1 natorial topology to interpret Tonti diagrams in a discrete the only option for the input is d = 0 leading to a simple setting. Another example is a directed graph representation partial derivative (0 → 1), whereas for D = 3, we can have of lumped-parameter networks such as system models in d = 0, 1, 2 leading to gradient (0 → 1), curl (1 → 2), and Modelica or electrical circuits in Spice. The variables in divergence (2 → 3) operations, respectively. this case are associated with nodes, edges, and meshes (i.e., These sequences may represent different (mechanical, primitive cycles) and incidence relations are obtained from electrical, thermal, etc.) domains of physics. Although, for graph connectivity and edge directions. The same topolog- most known physics, each domain’s theory appears as one ical operator that leads to a spatial divergence, discretized pair of (primary and secondary) sequences in tandem, con- by a sum of fluxes on the incident faces of a volume in a nected by horizontal (or horizontal-diagonal) constitutive 3D mesh, also leads to a superposition of forces on interact- relations leading to Tonti diagrams, we do not make any ing planets, sum of currents in/out of junctions in electrical such restriction when looking for new theories. The cross- circuits, and superposition of torques on kinematic chains sequence links can thus represent both single-physics con- (Fig. 2 (b, c, d)). Both ODEs and PDEs and their integral stitutive relations and mutli-physics coupling interactions. or integro-differential forms upon full or semi-discretization Conservation laws, on the other hand, are represented by a can be captured with the same (abstract) operators, and Tonti balance between the output of a topological operator and an diagrams serve as recipes to compose them to generate gov- external source/sink, the latter being represented by a loop. erning equations. It is often more convenient to define product spaces (e.g., Figure 3 shows a few other examples of Tonti diagrams separate 3D space and 1D time, as opposed to 4D spacetime) for fundamental theories in classical and relativistic physics. in which conservation laws are stated as sums of incom- The differences amount to (a) topological and metric con- ing topological relations being balanced against an external text; (b) relevant variables and their dimensions/units; and source/sink. To accommodate such representations, we de- (c) phenomenological relations. fine abstract (symbolic) I-nets on a product of a D1 −space Figure 3: Tonti diagrams capture the common structure responsible for analogies across classical and relativistic physics with a clear distinction between topological and phenomenological relations that follow certain rules. Figure 4: I-nets are generalizations of Tonti diagrams for finite topological products of finite-dimensional spaces with relaxed rules for feasible phenomenological links to accomodate middle-ground theories. and a D2 −space as multi-sequences of co-chains, connected the next step is to generate and test the hypotheses in a by phenomenological links, as before. It is possible to form “simple-first” fashion. The search space is defined by a di- 22 = 4 possible such multi-sequences with various ori- rected acyclic graph (DAG) whose nodes (i.e., ‘states’) rep- entation combinations, two of which lead to so-called me- resent symbolic I-net instances. The edges (i.e., state tran- chanical and field theories (Tonti 2013), shown in Fig. 1 sitions) represent generating a new I-net structure by incre- for (3+1)D spacetime and repeated in Fig. 4 (b) for higher- mentally adding complexity to the parent state. Each action dimensional pairs of abstract topological spaces. This con- can be one or composition of (a) defining a new symbolic struction is generalized to products of more than two spaces variable, in an existing co-chain complex, by applying a in a straightforward combinatorial fashion. topological operator to an existing variable; (b) defining a Based on the topological context, the semantics for co- new variable in a latent co-chain complex; and (c) adding boundary operators is unambiguously determined by the di- phenomenological links of prescribed form and unknown mensions of the two variables (i.e., co-chains) they relate. parameters, connecting existing variables. The search is However, phenomenological links require specifying a pa- guided by a loss function determined by how well the hy- rameterization of possibly nonlinear, in-place, and purely potheses represented by these I-net structures explain a metric relations they represent, using unknown parameters given dataset. The algorithm may also be equipped with that must be learned from data. user-specified heuristic rules to prune the search space or Once one or more hypotheses are specified in the lan- prioritize paths that are perceived as “more likely” due to guage of abstract (symbolic) I-nets with unknown phe- structural analogies with existing theories. nomenological parameters (e.g., thermal conductivity in the The input to the search algorithm includes the bare min- earlier heat transfer example), the parameters can be opti- imum contextual information such as the assumed under- mized to fit the data and the regression error can be used to lying topology, a preset number of physical domains, and evaluate the fitness of hypotheses. the types of measured variables, e.g., spatiotemporal as- sociations, tensor ranks and shapes, and dimensions/units. A Search for Viable Hypotheses The search starts from an “initial” I-net instance (i.e., the Having defined a combinatorial representation of viable hy- ‘root’) that embodies only measured variable(s) with no ini- potheses that are partially ordered in terms of complexity, tial edges except the ones that are asserted a priori, e.g., Figure 5: The search space for the dynamics of a pendulum in 1D time. The complete hypotheses (yellow nodes) correspond to I-net structures that pose new nontrivial equations to be tested against data, whereas incomplete hypotheses (white nodes) have “dangling” branches that are completed in their child states. Figure 6: The hypotheses H-04 and H-08 of Fig. 5 are enumerated and visualized by the software and evaluated against data (split 0.7-0.3 for training/testing). Both energy (first-order) and and torque (second-order) forms of the governing equation are discovered without human intervention. The former was quite unexpected, since its I-net structure does not correspond to a Tonti diagram. The latter has a larger error due to finite difference discretization. loops for initial/boundary conditions or source terms, if ap- the hypothesis H-01 produces a new variable typed as a plicable. The spatio-temporal types and physical semantics 1−pseudo-form T (e τ 1 ) = f1 (θ(∗e τ 1 )), where the ∗−operator 1 for these variables are provided by the experimentalist. takes τe to its dual: ∗(e τ , τe + ) = τe0 + /2. However, 0 0 For example, consider a simple pendulum (Fig. 5 (a)). We until this new variable is reached through another path to have only 1D time, leading to a topological space of inter- close a cycle and pose a nontrivial equation, we do not connected time instants τ 0 , τe0 = τ 0 + /2 and time intervals have a complete hypothesis to (in)validate against data. Fur- τ 1 = (τ 0 , τ 0 + ), τe1 = (eτ 1 , τe1 + ) to which data may ther down the search DAG, H-08 defines a new variable be associated. Suppose we are given time series data for an- typed as a 0−pseudo-form L(e τ 0 ) = f2 (ω(∗e τ 0 )) where 0 0 0 gular position θ(τ 0 ). The initial I-net instance is a single ∗e τ = (e τ − /2, τe + /2). The co-boundary operation   symbolic variable for this 0−form, which can be differen- L(eτ 0 ) → T (e τ 1 ) = δ[L](eτ 1 ), closes the cycle and produces tiated only once in primary 1D time to obtain angular ve- a commutative diagram (Fig. 5 (c)) leading to: locity as a 1−form: θ(τ 0 ) → ω(τ 1 ) = δ[θ](τ 1 ) at the root EH-08 (θ; f1 , f2 ) = f1 (θ) − δ ∗ [f2 (δ[θ])] = 0, (1) of the search DAG (Fig. 5 (b)). The DAG is expanded by adding new phenomenological links, either between two ex- where f1 , f2 are selected from restricted function spaces isting variables, or between an existing variable and one in a F1 , F2 to avoid overfitting (e.g., parameterized by a lin- newly added latent co-chain sequence (Fig. 5 (c)). In this ex- ear combination of domain-aware basis functions) and their ample, the hypotheses are numbered H-00 (the root) through parameters must be determined from data to minimize the H-15, enumerating all possible I-net structures formed by at residual error EH-08 over the entire period of data collec- most one latent co-chain complex in 1D time. The user can tion. A loss function can, for example, be defined as a mean- specify the maximum number of latent variables that the al- squared-error (MSE) to penalize violations uniformly over gorithm may consider, to keep the search tractable. the time series period: Not every introduction of new variables or relations makes nontrivial statements about physics. For example, LossH-08 = min min EH-08 (θ; f1 , f2 ) τe1 , (2) f1 ∈F1 f2 ∈F2 where k · kτe1 is an L2 −norm computed as a temporal in- same ODE upon differential interpretation of the I-nets. For 2 tegral, i.e., sum of squared errors EH-08 (θ; f1 , f2 ) over time ODEs which, after simplification, are linear combinations of 1 nonlinear (differential/algebraic) terms that are computable intervals τe where (1) is evaluated. In this example, it turns out that the best fit is achieved with f1 (θ) = c1 sin θ and from data, we can apply symbolic regression to estimate f2 (ω) = c2 ω where c2/c1 = −g/r. The latent variables the coefficients from data; for example, we tried LASSO- L(e τ 0 ) and L(e τ 0 ) turn out to be the familiar notions of an- regularized least-squares regression in PDE-FIND (Rudy gular momentum and torque, respectively, although the soft- et al. 2017) where each term involving a derivative is eval- ware need not know anything about them to generate and test uated using finite difference or polynomial approximation, what-if scenarios about their existence and correlations with whose results are shown in Fig. 6. angular position and velocity. Hence, interpretability of the There are at least two issues with this approach: discovered relationships by a human scientist does not re- First, more sophisticated regression or nonlinear program- quire predisposing the AI associate to such interpretations, ming methods are needed if the DE has terms that have enabling unexpected discoveries. nested nonlinear functions, i.e., cannot be represented as a In general, every state in the search DAG can be classified linear combination of nonlinear terms because of unknown as complete or incomplete hypotheses. The former are I-net coefficients embedded within each term. We solve this prob- structures with “dangling” branches that carry no new non- lem by directly mapping I-net structures to computation trivial information in addition to their parent states. Every graphs in PyTorch, skipping differential interpretation to time such a branch is turned into one or more closed cy- symbolic DEs altogether. cles by adding enough new variables and/or relations, a new Second, numerical approximation of symbolic PDEs is constraint is hypothesized that can be evaluated against data. a tricky business, as the discrete forms (in 3D space) may When adding new dangling branches to the I-net structure, not obey the conservation principles postulated by the I-net the search algorithm prioritizes actions that produce I-net structure after such approximations. It is difficult to sepa- structures similar to existing Tonti diagrams by assigning a rate discretization errors from modeling errors and noise in penalty factor to every violation of the common structure data. One of the key advantages of I-nets is the rich geo- (e.g., diagonal phenomenological links connecting non-dual metric information in their type system that is fundamental cells). The loss for complete hypotheses can be computed as to physics-compatible and mimetic discretization schemes the sum of penalties for the I-net structure and the sum of (Koren et al. 2014; Palha et al. 2014; Lipnikov, Manzini, residual errors for each of the independent constraints, im- and Shashkov 2014) that ensure conservation laws are sat- plied by converging paths, multiplied by use-specified rel- isfied exactly as a discrete level, regardless of spatial mesh ative weight of the penalties and errors. We use an A* al- or time-step resolutions. Such information is lost upon con- gorithm to search the space of hypotheses. Since we can- version to symbolic DEs. Retaining this information is even not compute the error for incomplete hypotheses, we can more important when dealing with noisy data, because dis- only prune them when the increase in their penalty is large crete differentiation of noisy data (e.g., via finite difference enough that it would fail even if it had no error at all. or polynomial fitting) can substantially amplify the noise. The good news is that we can directly interpret the same Generating Symbolic Expressions I-net instance in integral form to generate equations over One of the practical features of our implementation in larger regions in space and/or time, to make the computa- Python is its ability to automatically convert I-net in- tions more resilient to noise. For example, in the heat equa- stances to symbolic DE expressions in SymPy, when the tion, the discrete divergence of heat flux over a single 3−cell co-boundary operators are interpreted in a differential set- is replaced by a flux integral over a collection of 3−cells, ting for infinitesimal cells ( → 0+ ); for example, equation and is equated against the volumetric intgeral of internal en- (1) can be rewritten as a nonlinear ODE: ergy within the collection. The cancellation of internal sur- face fluxes (discrete form of Gauss’ divergence theorem) ∂ h  i EH-08 (θ; f1 , f2 ) = f1 (θ) − f2 θ̇(t) . (3) is built into the interpretation based on cellular homology. ∂t The integrals can be computed using higher-order integra- As a result, the generated hypotheses can be evaluated using tion schemes, e.g., using polynomial interpolation with un- any number of existing ML or symbolic regression frame- derfitting to filter the noise. works that standardize on ODE/PDE inputs. For example, Further details on directly and automatically mapping the using non-orthogonal basis functions {1, x, x2 , sin x, cos x} abstract (symbolic) I-net structures to discrete (cellular) and to span both function spaces F1 , F2 , we can substitute for numerical (tensor-based) I-net instance (e.g., computation both symbolic functions: graphs in PyTorch), learning scale-aware phenomenologi- cal relations, and physics-compatible discretization and de- f1 (θ) := c01 + c11 θ + c21 θ2 + c31 sin θ + c31 cos θ, (4) noising will be presented in a full paper. f2 (θ̇) := c02 + c12 θ̇ + c22 θ̇2 + c32 sin θ̇ + c32 cos θ̇, (5) into (3) to obtain a symbolic second-order (non)linear ODE Real-World Scientific Discovery in SymPy. Next, the software performs algebraic simpli- Figures 7 and 8 illustrate the application of our AI approach fication to identify equivalence classes of hypotheses that, to an elastodynamics challenge problem provided by AFRL despite coming from different I-net structures, lead to the in the course of the DARPA AI Research Associate (AIRA) Figure 7: The search DAG and a number of viable hypotheses to explain ultrasound wavefield in metal parts. Figure 8: The AI associate discovers the (integral form) of the wave equation as well as the proper length/time scale at which the heterogeneous material properties (in this case, speed of sound) must be defined. program that supported the development of CyPhy. The in- Conclusion put is noisy data obtained by ultrasound imaging, measured in (2+1)D spacetime over the surface of several material Statistical learning methods, despite their accuracy and ef- samples with heterogeneous properties. ficiency in narrow regimes for which they are carefully en- gineered, are not sufficient to independently acquire deep Figure 7 illustrates the search DAG along with a num- understandings of the scientific problems they are applied ber of I-net structures for viable hypotheses, each postulat- to. Human scientists continue to handle most of knowledge- ing the relevance of a conservation law and existence of a centric aspects of the scientific process based on domain- few phenomenological relations. Figure 8 shows the rank- specific insight, experience, and expertise. ing of these hypotheses based on their residual errors when Our novel approach to early-stage scientific hypothesis tested against data. Each hypothesis can be interpreted in generation and testing demonstrates a path forward towards differential, integral, or integro-differential forms. The re- context-aware, generalizable, and interpretable AI for scien- sults demonstrate that integral forms applied to wide spatial tific discovery. Our AI associate (CyPhy) distinguishes be- and temporal neighborhoods (of ∼ 25 grid elements along tween non-negotiable mathematical truism, implied by the each axis) with high-order polynomial underfitting (up to relationship between measurement and presupposed space- cubic in each coordinate), resulting in a length/time scale- time topology, and phenomenological realities that are at the aware definition of (nonlocal) phenomenological relations mercy of empirical learning. Data-driven regression is tar- as well as physics-compatible (i.e., mimetic) discretization geted at the latter to enable distilling governing equations and de-noising, are preferable to strictly local numerical from sparse and noisy data, while providing deep insights schemes such as finite difference discretization. into the mathematical foundations. Acknowledgment Miyanawala, T. P.; and Jaiman, R. K. 2017. An Efficient Deep Learning Technique for the Navier-Stokes Equations: Application This material is based upon work supported by the De- to Unsteady Wake Flow Dynamics. arXiv Preprints ISSN 0264- fense Advanced Research Projects Agency (DARPA) under 6021. doi:10.1016/j.eswa.2008.08.077. Agreement No. HR00111990029. Nautrup, H. P.; Metger, T.; Iten, R.; Jerbi, S.; Trenkwalder, L. M.; Wilming, H.; Briegel, H. J.; and Renner, R. 2020. Operationally References Meaningful Representations of Physical Systems in Neural Net- Bott, R.; and Tu, L. W. 1982. Differential Forms in Algebraic works. arXiv preprint arXiv:2001.00593 . Topology. Springer Science & Business Media. Palha, A.; Rebelo, P. P.; Hiemstra, R.; Kreeft, J.; and Gerritsma, Branin, F. H. 1966. The Algebraic-Topological Basis for Network M. 2014. Physics-Compatible Discretization Techniques on Single Analogies and the Vector Calculus. In Symposium on Generalized and Dual Grids, with Application to the Poisson Equation of Vol- Networks, 453–491. Polytechnic Institute of Brooklyn, NY. ume Forms. Journal of Computational Physics 257: 1394–1422. Breen, P. G.; Foley, C. N.; Boekholt, T.; and Zwart, S. P. 2020. Raghu, M.; and Schmidt, E. 2020. A Survey of Deep Learning for Newton versus the Machine: Solving the Chaotic Three-body Prob- Scientific Discovery. arXiv preprint arXiv:2003.11755 . lem Using Deep Neural Networks. Monthly Notices of the Royal Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2019. Physics- Astronomical Society 494(2): 2465–2470. Informed Neural Networks: A Deep Learning Framework for Solv- Butler, K. T.; Davies, D. W.; Cartwright, H.; Isayev, O.; and Walsh, ing Forward and Inverse Problems Involving Nonlinear Partial Dif- A. 2018. Machine Learning for Molecular and Materials Science. ferential Equations. Journal of Computational Physics 378: 686– Nature 559(7715): 547–555. doi:10.1038/s41586-018-0337-2. 707. Cranmer, M. D.; Xu, R.; Battaglia, P.; and Ho, S. 2019. Learn- Roth, J. P. 1955. An Application of Algebraic Topology to Nu- ing Symbolic Physics with Graph Networks. arXiv preprint merical Analysis: On the Existence of a Solution to the Network arXiv:1909.05862 . Problem. Proceedings of the National Academy of Sciences 41(7): 518–521. Daw, A.; Thomas, R. Q.; Carey, C. C.; Read, J. S.; Appling, A. P.; and Karpatne, A. 2020. Physics-Guided Architecture (PGA) of Rudy, S. H.; Brunton, S. L.; Proctor, J. L.; and Kutz, J. N. 2017. Neural Networks for Quantifying Uncertainty in Lake Temperature Data-Driven Discovery of Partial Differential Equations. Science Modeling. In Proceedings of the 2020 SIAM International Confer- Advances 3(4): e1602614. ence on Data Mining, 532–540. Society for Industrial and Applied Sanchez-Gonzalez, A.; Godwin, J.; Pfaff, T.; Ying, R.; Leskovec, Mathematics (SIAM). J.; and Battaglia, P. W. 2020. Learning to Simulate Complex Physics with Graph Networks. arXiv preprint arXiv:2002.09405 Frankel, T. 2011. The Geometry of Physics: An Introduction. Cam- . bridge University Press. Schmidt, M.; and Lipson, H. 2009. Distilling Free-Form Natural Hatcher, A. 2001. Algebraic Topology. Cornell University. Laws from Experimental Data. Science 324(5923): 81–85. Hirani, A. N. 2003. Discrete Exterior Calculus. Ph.D. dissertation, Seo, S.; and Liu, Y. 2019. Differentiable Physics-Informed Graph California Institute of Technology. Networks. arXiv preprint arXiv:1902.02950 . Iten, R.; Metger, T.; Wilming, H.; Del Rio, L.; and Renner, R. 2020. Stevens, R.; Taylor, V.; Nichols, J.; Maccabe, A. B.; Yelick, K.; Discovering Physical Concepts with Neural Networks. Physical and Brown, D. 2020. AI for Science. Technical report, Argonne Review Letters 124(1): 010508. National Laboratory(ANL). Kitano, H. 2016. Artificial Intelligence to Win the Nobel Prize and Tonti, E. 2013. The Mathematical Structure of Classical and Rela- Beyond: Creating the Engine for Scientific Discovery. AI Magazine tivistic Physics: A General Classification Diagram. Modeling and 37(1): 39–49. Simulation in Science, Engineering and Technology. Birkhäuser. Koren, B.; Abgrall, R.; Bochev, P.; Frank, J. E.; and Perot, B. 2014. ISBN 9781461474210. Physics-Compatible Numerical Methods. Journal of Computa- Udrescu, S. M.; and Tegmark, M. 2020. AI Feynman: A Physics- tional Physics 257(Part B): 1039–1039. Inspired Method for Symbolic Regression. Science Advances Kron, G. 1963. Diakoptics: The Piecewise Solution of Large-Scale 6(16): eaay2631. Systems, volume 2. MacDonald. Wang, R.; Kashinath, K.; Mustafa, M.; Albert, A.; and Yu, R. 2020. Langley, P. 1998. The Computer-Aided Discovery of Scientific Towards Physics-Informed Deep Learning for Turbulent Flow Pre- Knowledge. In International Conference on Discovery Science, diction. In Proceedings of the 26th ACM SIGKDD International 25–39. Springer. Conference on Knowledge Discovery & Data Mining, 1457–1466. Wei, Z.; and Chen, X. 2019. Physics-Inspired Convolutional Neu- Lipnikov, K.; Manzini, G.; and Shashkov, M. 2014. Mimetic Finite ral Network for Solving Full-Wave Inverse Scattering Problems. Difference Method. Journal of Computational Physics 257: 1163– IEEE Transactions on Antennas and Propagation 67(9): 6138– 1227. 6148. Mattiussi, C. 2000. The Finite Volume, Finite Element, and Fi- Wu, T.; and Tegmark, M. 2019. Toward an AI Physicist for Unsu- nite Difference Methods as Numerical Methods for Physical Field pervised Learning. arXiv preprint arXiv:1810.10525 . Problems. Advances in Imaging and Electron Physics 113: 1–147. Yang, K. K.; Wu, Z.; and Arnold, F. H. 2019. Machine-Learning- Mehta, P.; Bukov, M.; Wang, C. H.; Day, A. G. R.; Richardson, C.; Guided Directed Evolution for Protein Engineering. Nature Meth- Fisher, C. K.; and Schwab, D. J. 2019. A High-Bias, Low-Variance ods 16(8): 687–694. doi:10.1038/s41592-019-0496-6. Introduction to Machine Learning for Physicists. Physics Reports 810: 1–124.