AI Research Associate for Early-Stage Scientific Discovery
                               Morad Behandish1∗ , John T. Maxwell III1 , Johan de Kleer1
                                                   1
                                                 Palo Alto Research Center (PARC)
                                3333 Coyote Hill Road, Palo Alto, California 94304 (www.parc.com)


                            Abstract                                  built-in ontological biases in most machine learning (ML)
                                                                      frameworks prevent them from thinking outside the box to
  Artificial intelligence (AI) has been increasingly applied in       discover not only the known-unknowns, but also unknown-
  scientific activities for decades; however, it is still far from
                                                                      unknowns, during early stages of the scientific process.
  an insightful and trustworthy collaborator in the scientific
  process. Most existing AI methods are either too simplis-
  tic to be useful in real problems faced by scientists or too        Contributions
  domain-specialized (even dogmatized), stifling transforma-           We present ‘cyber-physicist’ (CyPhy), our novel AI re-
  tive discoveries or paradigm shifts. We present an AI research       search associate for early-stage scientific process of hypoth-
  associate for early-stage scientific discovery based on (a) a        esis generation and initial (in)validation, grounded in the
  novel minimally-biased ontology for physics-based modeling           most invariable mathematical foundations of classical and
  that is context-aware, interpretable, and generalizable across       relativistic physics. Our framework distinguishes itself from
  classical and relativistic physics; (b) automatic search for
                                                                       existing rule-based reasoning, statistical learning, and hy-
  viable and parsimonious hypotheses, represented at a high-
  level (via domain-agnostic constructs) with built-in invari-         brid AI methods by:
  ants, e.g., postulated forms of conservation principles implied    (1) an ability to rapidly enumerate and test a diverse set
  by a presupposed spacetime topology; and (c) automatic com-             of mathematically sound and parsimonious physical hy-
  pilation of the enumerated hypotheses to domain-specific, in-           potheses, starting from a few basic assumptions on the
  terpretable, and trainable/testable tensor-based computation            embedding spacetime topology;
  graphs to learn phenomenological relations, e.g., constitutive
  or material laws, from sparse (and possibly noisy) data sets.      (2) a distinction between non-negotiable mathematical truism
                                                                          (e.g., conservation laws or symmetries), that are directly
                                                                          implied by properties of spacetime, and phenomenolog-
                        Introduction                                      ical relations (e.g., constitutive laws), whose characteri-
Data-driven AI methods have been applied extensively in the               zation relies indisputably on empirical observation, jus-
past few decades to distill nontrivial physics-based insights             tifying targeted use of data-driven methods (e.g., ML or
(scientific discovery) and to predict complex dynamical be-               polynomial regression); and
havior (scientific simulation) (Stevens et al. 2020). Notwith-       (3) a “simple-first” strategy (following Occam’s razor) to
standing their effectiveness and efficiency in classification,            search for new hypotheses by incrementally introducing
regression, and forecasting tasks, statistical learning meth-             latent variables that are expected to exist based on topo-
ods can hardly ever evaluate the soundness of a function fit,             logical foundations of physics.
explain the reasons behind observed correlations, or provide
sufficiently strong guarantees to replace parsimonious and            Background
explainable scientific expressions such as differential equa-         AI-assisted discovery of scientific knowledge has been an
tions (DE). Hybrid methods such as constructing “physics-             active area of research (Langley 1998) long before the rise
informed/inspired/guided” architectures for neural nets and           of GPU-accelerated deep learning (DL). As computational
loss functions that penalize both predication and DE resid-           power and data sources are becoming more ubiquitous,
ual errors (Raissi, Perdikaris, and Karniadakis 2019; Wei             model-based, data-driven, and hybrid AI methods are play-
and Chen 2019; Daw et al. 2020) and graph-nets based on               ing an increasingly more important role in various scientific
control theory and combinatorial structures (Cranmer et al.           activities (Kitano 2016; Raghu and Schmidt 2020).
2019; Seo and Liu 2019; Sanchez-Gonzalez et al. 2020)                    Related efforts to our approach to scientific hypothesis
are all important steps towards explainability; however, the          generation and evaluation are mostly engineered after how
   ∗
     Corresponding Author, e-mail: moradbeh@parc.com.                 humans approach scientific discovery, including sequential
Copyright c 2021 for this paper by its authors. Use permitted un-     rule-based symbolic regression (Schmidt and Lipson 2009;
der Creative Commons License Attribution 4.0 International (CC        Udrescu and Tegmark 2020), latent space representation
BY 4.0)                                                               learning via deep neural net auto-encoders (Iten et al. 2020;
 Nautrup et al. 2020) and strategic combinations of divide-        embedding spacetime using well-established concepts from
 and-conquer, unsupervised learning, simplification by pe-         cellular homology (Hatcher 2001) and exterior calculus of
 nalizing description lengths in the loss function, and a pos-     differential and discrete forms (Bott and Tu 1982; Hirani
 teriori unification by clustering (Wu and Tegmark 2019).          2003) that are under-utilized in AI.
 While these and other efforts have shown great promise for
 elevating AI to the role of an autonomous, creative, and in-      Topological Foundations of Physics
 sightful collaborator that can offer human scientists a set of    The key enabler of our AI framework is a simple type sys-
 viable options to consider, their applications have remained      tem for (a) physical variables, based on how they are mea-
 limited to rather basic examples.                                 sured in spacetime; and (b) physical relations, based on their
    On the more domain-specialized end, DL has been widely         (topological vs. metric) nature, and the variables they con-
 successful in classification, regression, and forecasting tasks   nect. Following the ground-breaking discoveries by a num-
 in scientific areas as diverse as turbulence (Miyanawala and      ber of mathematicians, physicists, and electrical engineers
 Jaiman 2017; Wang et al. 2020), chaotic particle dynam-           (Kron 1963; Roth 1955; Branin 1966) towards a general
 ics (Breen et al. 2020), molecular chemistry and materials        network theory, Tonti explained the fascinating analogies
 science (Butler et al. 2018), and protein engineering (Yang,      across classical and relativistic physics in his pioneering life-
 Wu, and Arnold 2019), among others. Most specialized DL           long work (Tonti 2013) by reframing them in the language
 architectures are ad hoc, designed (by humans) using nar-         of cellular homology, leading to informal classification dia-
 row, domain-specific, and (by construction) biased knowl-         grams. Tonti diagrams can be formalized as directed graphs
 edge and expertise, stifling innovation and surprise. More-       with strongly typed nodes for variables and edges for rela-
 over, DL models that successfully capture nontrivial patterns     tions. The variable are typed as (d1 , d2 )−forms based on
 in data are often difficult to explain, lack guarantees even      their measurement on d1 − and d2 −dimensional submani-
 within their training space, and poorly extrapolate to out-of-    folds (d1 − and d2 −cells) of space and time, respectively.
 training scenarios (Mehta et al. 2019). Training such models      For instance, to model heat transfer in (3+1)D spacetime,
 for high-dimensional physics problems requires enormous           temperature is typed as a (0, 1)−form because it is mea-
 data, which is either unavailable or too costly to obtain in      sured at spatial points (0−cells) and during temporal inter-
 many experimental sciences.                                       vals (1−cells), whereas heat flux is typed as a (2, 1)−form
                                                                   because it is measured over spatial surfaces (2−cells) and
                   The Cyber-Physicist                             during temporal intervals (1−cells). In classical calculus,
  We introduce an AI tool that can bridge multiple levels of ab-   both of these variables reduce to scalar and vector fields,
  straction, using a domain-agnostic representation scheme to      probed at spatial points and at temporal instants, to write
  express a wide range of mathematically viable physical hy-       down compact pointwise DEs; however, keeping track of
  potheses by exploiting common structural invariants across       the topological and geometric character of DEs is key to a
  physics. Our approach entails:                                   deeper understanding of how known physical theories work,
(a) defining a relatively unbiased ontology rooted in funda-       and building on top of it for AI-assisted discovery of new
     mental abstractions that are common to all known theories     physics grounded in mathematical foundations.
     of classical and relativistic physics;                           The spatiotemporal cells (or embedding manifolds) are
(b) constructing a constrained search space to enumerate vi-       further classified as primary or secondary, endowed with in-
     able hypotheses with postulated invariants, e.g., built-in    ner or outer orientations, respectively, depending on how the
     conservation laws that are consistent with the presup-        variables change sign in a hypothetical reversal of spacetime
     posed spacetime topology; and                                 orientation (Mattiussi 2000). The cells are related by topo-
                                                                   logical duality (Fig. 1 (a)). For example, an inner-oriented
(c) automatically assembling interpretable ML architectures        curve (1−cell, σ 1 ) sitting in primary space, along which
     for each hypothesis, to estimate parameters for phe-          temperature variations are measured, is dual to an outer-
     nomenological relations from empirical data.                  oriented surface (2−cell, σe2 ) sitting in secondary space, over
  At the core of (a) is a powerful mathematical abstraction        which heat flux is measured, and the two cells are spatially
  of physical governing equations rooted in algebraic topol-       registered and consistently oriented, if we embed them in
  ogy and differential geometry (Frankel 2011). This abstrac-      two co-located “copies” of 3D space.
  tion leads to an ontological commitment to the relationship         The relations among variables on the Tonti diagrams are
  between physical measurement and basic properties of the         typed based on the pairs of variables they relate, as well as
  embedding spacetime—but nothing more, to leave room for          the nature of the relation itself:
  innovation and surprise. This relationship has been shown
  to be responsible for the analogies and common structure         • Topological relations map spatiotemporal forms to forms
  across physics (Tonti 2013), exploited in (b), along with          of one higher dimension in space or time via incidence
  search heuristics based on analogical reasoning. Each vi-          relations, and are responsible for propagation of informa-
  able hypothesis is automatically compiled to an interpretable      tion in spacetime through incident cells.
  “computation graph”—tensor-based architecture, akin to a         • Metric relations locally map forms defined over dual cells
  neural net with convolution layers to compute differenti-          to one another based on phenomenological properties,
  ation/integration and (non)linear local operators for con-         spatial lengths, and temporal durations, and are respon-
  stitutive equations—for a given cellular decomposition of          sible for local distortion of information.
Figure 1: A topology-aware representation for physics (Tonti 2013): (a) variables associated with spatial and temporal cells
of various dimensions give rise to primary forms and secondary forms (also called pseudo-forms); (b) resulting in 32 possible
types for spatio-temporal forms, and an underlying structure for fundamental theories of physics.


Figure 2: Tonti diagrams are recipes to generate governing equations in different contexts, defined by a continuum, discrete,
or semi-discrete setting and a topological embedding of the variables based on how they are measured. The conservation laws
in terms of co-boundary operators result directly from assumed properties of space (or spacetime), while constitutive relations
must be learned from data (e.g., via regression/ML).
• Algebraic relations are in-place, i.e., map a given form        An Ontology for Scientific Process
  to another from of the same type, and can be used to            We present a novel representation, called ‘interaction net-
  capture initial/boundary conditions, external source/sink       works’ (I-nets), based on a generalization of Tonti dia-
  terms, or cross-physics couplings between variables of the      grams that is expressive and versatile enough to accom-
  same type on different diagrams.                                modate novel scientific hypotheses, while retaining a basic
                                                                  commitment to philosophical principles such as parsimony
The relations are drawn in Fig. 1 as vertical arrows, horizon-    (Occam’s razor), measurement-driven classification of vari-
tal (or horizontal-diagonal) arrows, and loops (1−cycles),        ables, and separation of non-negotiable mathematical prop-
respectively. The interpretation of these relations to sym-       erties of spacetime (homology) from domain-specific empir-
bolic or numerical operations depends on the choice of a          ical knowledge (phenomenology). Data science is employed
cellular decomposition of spacetime on which they operate.        to help only with the latter.
For example, using a continuum spacetime with infinitesi-
                                                                  • We conceptualize three levels of abstraction related by in-
mal cells, the variables are viewed as differential forms and
                                                                    heritance: abstract (symbolic) I-nets → discrete (cellular)
the topological operators on them are interpreted as exte-
                                                                    I-nets → numerical (tensor-based) I-nets.
rior derivatives (Bott and Tu 1982). In elementary calculus,
these operators give rise to gradient, curl, and divergence       • At each level, an I-net instance is contextualized by user-
in space and partial derivative in time in terms of scalar          defined assumptions on spacetime topology, semantics of
and vector fields that are proxy to these forms, leading to         physical quantities, and structural restrictions on allow-
partial DEs (PDEs). In a discrete (or semi-discrete) setting,       able diagrams based on analogical reasoning and domain-
on the other hand, the same diagram can be used to pro-             specific insight (if available).
duce integral (or integro-differential) equations that capture    • Every I-net instance distinguishes between topological
the same fundamental conservation and constitutive reali-           and metric operators; however, it has additional degrees
ties, where the variables are viewed as co-chains, also called      of freedom (beyond Tonti diagrams) for the latter to allow
discrete forms (or mixed forms, e.g., discrete in space, dif-       for phenomenological relations among variables that may
ferential in time, or vice versa) and the topological oper-         not be dual to each other.
ators become co-boundary operators that are fundamental
in cellular homology (Hatcher 2001). For example, using a            The latter is motivated by the observation that some exist-
semi-discretization in space with integral quantities associ-     ing middle-ground theories use phenomenological relations
ated with 0−, 1−, 2− and 3−cells on a pair of staggered           to capture a combination of topological and metric aspects.
unstructured meshes in 3D, while keeping time as a contin-           We define an abstract (symbolic) I-net on a single
uum, the semi-discrete form of the heat equation as a system      D−space as a finite collection of primary and/or secondary
of ordinary DEs (ODEs) (Fig. 2 (a)). Upon discretization of       co-chain complexes that are inter-connected by phenomeno-
time, one obtains algebraic equations that can be solved or       logical links, as shown in Fig. 4 (a). Each co-chain complex
parameter estimated via tensor-based ML.                          is a sequence of (symbolic) d−forms related by (symbolic)
                                                                  co-boundary operators from d−forms to (d + 1)−forms
   It is important to note that 3D meshes in space and 1D         (0 ≤ d ≤ D). The interpretation of d → (d + 1) maps de-
time-stepping are not the only ways to provide a combi-           pends on the embedding dimension D; for instance, if D = 1
natorial topology to interpret Tonti diagrams in a discrete       the only option for the input is d = 0 leading to a simple
setting. Another example is a directed graph representation       partial derivative (0 → 1), whereas for D = 3, we can have
of lumped-parameter networks such as system models in             d = 0, 1, 2 leading to gradient (0 → 1), curl (1 → 2), and
Modelica or electrical circuits in Spice. The variables in        divergence (2 → 3) operations, respectively.
this case are associated with nodes, edges, and meshes (i.e.,        These sequences may represent different (mechanical,
primitive cycles) and incidence relations are obtained from       electrical, thermal, etc.) domains of physics. Although, for
graph connectivity and edge directions. The same topolog-         most known physics, each domain’s theory appears as one
ical operator that leads to a spatial divergence, discretized     pair of (primary and secondary) sequences in tandem, con-
by a sum of fluxes on the incident faces of a volume in a         nected by horizontal (or horizontal-diagonal) constitutive
3D mesh, also leads to a superposition of forces on interact-     relations leading to Tonti diagrams, we do not make any
ing planets, sum of currents in/out of junctions in electrical    such restriction when looking for new theories. The cross-
circuits, and superposition of torques on kinematic chains        sequence links can thus represent both single-physics con-
(Fig. 2 (b, c, d)). Both ODEs and PDEs and their integral         stitutive relations and mutli-physics coupling interactions.
or integro-differential forms upon full or semi-discretization    Conservation laws, on the other hand, are represented by a
can be captured with the same (abstract) operators, and Tonti     balance between the output of a topological operator and an
diagrams serve as recipes to compose them to generate gov-        external source/sink, the latter being represented by a loop.
erning equations.
                                                                     It is often more convenient to define product spaces (e.g.,
   Figure 3 shows a few other examples of Tonti diagrams          separate 3D space and 1D time, as opposed to 4D spacetime)
for fundamental theories in classical and relativistic physics.   in which conservation laws are stated as sums of incom-
The differences amount to (a) topological and metric con-         ing topological relations being balanced against an external
text; (b) relevant variables and their dimensions/units; and      source/sink. To accommodate such representations, we de-
(c) phenomenological relations.                                   fine abstract (symbolic) I-nets on a product of a D1 −space
Figure 3: Tonti diagrams capture the common structure responsible for analogies across classical and relativistic physics with a
clear distinction between topological and phenomenological relations that follow certain rules.


Figure 4: I-nets are generalizations of Tonti diagrams for finite topological products of finite-dimensional spaces with relaxed
rules for feasible phenomenological links to accomodate middle-ground theories.


and a D2 −space as multi-sequences of co-chains, connected         the next step is to generate and test the hypotheses in a
by phenomenological links, as before. It is possible to form       “simple-first” fashion. The search space is defined by a di-
22 = 4 possible such multi-sequences with various ori-             rected acyclic graph (DAG) whose nodes (i.e., ‘states’) rep-
entation combinations, two of which lead to so-called me-          resent symbolic I-net instances. The edges (i.e., state tran-
chanical and field theories (Tonti 2013), shown in Fig. 1          sitions) represent generating a new I-net structure by incre-
for (3+1)D spacetime and repeated in Fig. 4 (b) for higher-        mentally adding complexity to the parent state. Each action
dimensional pairs of abstract topological spaces. This con-        can be one or composition of (a) defining a new symbolic
struction is generalized to products of more than two spaces       variable, in an existing co-chain complex, by applying a
in a straightforward combinatorial fashion.                        topological operator to an existing variable; (b) defining a
   Based on the topological context, the semantics for co-         new variable in a latent co-chain complex; and (c) adding
boundary operators is unambiguously determined by the di-          phenomenological links of prescribed form and unknown
mensions of the two variables (i.e., co-chains) they relate.       parameters, connecting existing variables. The search is
However, phenomenological links require specifying a pa-           guided by a loss function determined by how well the hy-
rameterization of possibly nonlinear, in-place, and purely         potheses represented by these I-net structures explain a
metric relations they represent, using unknown parameters          given dataset. The algorithm may also be equipped with
that must be learned from data.                                    user-specified heuristic rules to prune the search space or
   Once one or more hypotheses are specified in the lan-           prioritize paths that are perceived as “more likely” due to
guage of abstract (symbolic) I-nets with unknown phe-              structural analogies with existing theories.
nomenological parameters (e.g., thermal conductivity in the           The input to the search algorithm includes the bare min-
earlier heat transfer example), the parameters can be opti-        imum contextual information such as the assumed under-
mized to fit the data and the regression error can be used to      lying topology, a preset number of physical domains, and
evaluate the fitness of hypotheses.                                the types of measured variables, e.g., spatiotemporal as-
                                                                   sociations, tensor ranks and shapes, and dimensions/units.
A Search for Viable Hypotheses                                     The search starts from an “initial” I-net instance (i.e., the
Having defined a combinatorial representation of viable hy-        ‘root’) that embodies only measured variable(s) with no ini-
potheses that are partially ordered in terms of complexity,        tial edges except the ones that are asserted a priori, e.g.,
Figure 5: The search space for the dynamics of a pendulum in 1D time. The complete hypotheses (yellow nodes) correspond
to I-net structures that pose new nontrivial equations to be tested against data, whereas incomplete hypotheses (white nodes)
have “dangling” branches that are completed in their child states.


Figure 6: The hypotheses H-04 and H-08 of Fig. 5 are enumerated and visualized by the software and evaluated against data
(split 0.7-0.3 for training/testing). Both energy (first-order) and and torque (second-order) forms of the governing equation are
discovered without human intervention. The former was quite unexpected, since its I-net structure does not correspond to a
Tonti diagram. The latter has a larger error due to finite difference discretization.


loops for initial/boundary conditions or source terms, if ap-      the hypothesis H-01 produces a new variable typed as a
plicable. The spatio-temporal types and physical semantics         1−pseudo-form T (e   τ 1 ) = f1 (θ(∗e τ 1 )), where the ∗−operator
                                                                            1
for these variables are provided by the experimentalist.           takes τe to its dual: ∗(e  τ , τe + ) = τe0 + /2. However,
                                                                                               0 0

   For example, consider a simple pendulum (Fig. 5 (a)). We        until this new variable is reached through another path to
have only 1D time, leading to a topological space of inter-        close a cycle and pose a nontrivial equation, we do not
connected time instants τ 0 , τe0 = τ 0 + /2 and time intervals   have a complete hypothesis to (in)validate against data. Fur-
τ 1 = (τ 0 , τ 0 + ), τe1 = (eτ 1 , τe1 + ) to which data may    ther down the search DAG, H-08 defines a new variable
be associated. Suppose we are given time series data for an-       typed as a 0−pseudo-form L(e         τ 0 ) = f2 (ω(∗e   τ 0 )) where
                                                                      0        0           0
gular position θ(τ 0 ). The initial I-net instance is a single     ∗e
                                                                    τ = (e    τ − /2, τe + /2). The co-boundary operation
                                                                                               

symbolic variable for this 0−form, which can be differen-          L(eτ 0 ) → T (e
                                                                                 τ 1 ) = δ[L](eτ 1 ), closes the cycle and produces
tiated only once in primary 1D time to obtain angular ve-          a commutative diagram (Fig. 5 (c)) leading to:
locity as a 1−form: θ(τ 0 ) → ω(τ 1 ) = δ[θ](τ 1 ) at the root
                                                                            EH-08 (θ; f1 , f2 ) = f1 (θ) − δ ∗ [f2 (δ[θ])] = 0,      (1)
of the search DAG (Fig. 5 (b)). The DAG is expanded by
adding new phenomenological links, either between two ex-          where f1 , f2 are selected from restricted function spaces
isting variables, or between an existing variable and one in a     F1 , F2 to avoid overfitting (e.g., parameterized by a lin-
newly added latent co-chain sequence (Fig. 5 (c)). In this ex-     ear combination of domain-aware basis functions) and their
ample, the hypotheses are numbered H-00 (the root) through         parameters must be determined from data to minimize the
H-15, enumerating all possible I-net structures formed by at       residual error EH-08 over the entire period of data collec-
most one latent co-chain complex in 1D time. The user can          tion. A loss function can, for example, be defined as a mean-
specify the maximum number of latent variables that the al-        squared-error (MSE) to penalize violations uniformly over
gorithm may consider, to keep the search tractable.                the time series period:
   Not every introduction of new variables or relations
makes nontrivial statements about physics. For example,                    LossH-08 = min min            EH-08 (θ; f1 , f2 ) τe1 ,   (2)
                                                                                         f1 ∈F1 f2 ∈F2
where k · kτe1 is an L2 −norm computed as a temporal in-                same ODE upon differential interpretation of the I-nets. For
                                        2
tegral, i.e., sum of squared errors EH-08   (θ; f1 , f2 ) over time     ODEs which, after simplification, are linear combinations of
             1                                                          nonlinear (differential/algebraic) terms that are computable
intervals τe where (1) is evaluated. In this example, it turns
out that the best fit is achieved with f1 (θ) = c1 sin θ and            from data, we can apply symbolic regression to estimate
f2 (ω) = c2 ω where c2/c1 = −g/r. The latent variables                  the coefficients from data; for example, we tried LASSO-
L(e τ 0 ) and L(e
                τ 0 ) turn out to be the familiar notions of an-        regularized least-squares regression in PDE-FIND (Rudy
gular momentum and torque, respectively, although the soft-             et al. 2017) where each term involving a derivative is eval-
ware need not know anything about them to generate and test             uated using finite difference or polynomial approximation,
what-if scenarios about their existence and correlations with           whose results are shown in Fig. 6.
angular position and velocity. Hence, interpretability of the              There are at least two issues with this approach:
discovered relationships by a human scientist does not re-                 First, more sophisticated regression or nonlinear program-
quire predisposing the AI associate to such interpretations,            ming methods are needed if the DE has terms that have
enabling unexpected discoveries.                                        nested nonlinear functions, i.e., cannot be represented as a
   In general, every state in the search DAG can be classified          linear combination of nonlinear terms because of unknown
as complete or incomplete hypotheses. The former are I-net              coefficients embedded within each term. We solve this prob-
structures with “dangling” branches that carry no new non-              lem by directly mapping I-net structures to computation
trivial information in addition to their parent states. Every           graphs in PyTorch, skipping differential interpretation to
time such a branch is turned into one or more closed cy-                symbolic DEs altogether.
cles by adding enough new variables and/or relations, a new                Second, numerical approximation of symbolic PDEs is
constraint is hypothesized that can be evaluated against data.          a tricky business, as the discrete forms (in 3D space) may
When adding new dangling branches to the I-net structure,               not obey the conservation principles postulated by the I-net
the search algorithm prioritizes actions that produce I-net             structure after such approximations. It is difficult to sepa-
structures similar to existing Tonti diagrams by assigning a            rate discretization errors from modeling errors and noise in
penalty factor to every violation of the common structure               data. One of the key advantages of I-nets is the rich geo-
(e.g., diagonal phenomenological links connecting non-dual              metric information in their type system that is fundamental
cells). The loss for complete hypotheses can be computed as             to physics-compatible and mimetic discretization schemes
the sum of penalties for the I-net structure and the sum of             (Koren et al. 2014; Palha et al. 2014; Lipnikov, Manzini,
residual errors for each of the independent constraints, im-            and Shashkov 2014) that ensure conservation laws are sat-
plied by converging paths, multiplied by use-specified rel-             isfied exactly as a discrete level, regardless of spatial mesh
ative weight of the penalties and errors. We use an A* al-              or time-step resolutions. Such information is lost upon con-
gorithm to search the space of hypotheses. Since we can-                version to symbolic DEs. Retaining this information is even
not compute the error for incomplete hypotheses, we can                 more important when dealing with noisy data, because dis-
only prune them when the increase in their penalty is large             crete differentiation of noisy data (e.g., via finite difference
enough that it would fail even if it had no error at all.               or polynomial fitting) can substantially amplify the noise.
                                                                           The good news is that we can directly interpret the same
Generating Symbolic Expressions                                         I-net instance in integral form to generate equations over
One of the practical features of our implementation in                  larger regions in space and/or time, to make the computa-
Python is its ability to automatically convert I-net in-                tions more resilient to noise. For example, in the heat equa-
stances to symbolic DE expressions in SymPy, when the                   tion, the discrete divergence of heat flux over a single 3−cell
co-boundary operators are interpreted in a differential set-            is replaced by a flux integral over a collection of 3−cells,
ting for infinitesimal cells ( → 0+ ); for example, equation           and is equated against the volumetric intgeral of internal en-
(1) can be rewritten as a nonlinear ODE:                                ergy within the collection. The cancellation of internal sur-
                                                                        face fluxes (discrete form of Gauss’ divergence theorem)
                                        ∂ h       i
         EH-08 (θ; f1 , f2 ) = f1 (θ) −    f2 θ̇(t) .      (3)          is built into the interpretation based on cellular homology.
                                        ∂t                              The integrals can be computed using higher-order integra-
As a result, the generated hypotheses can be evaluated using            tion schemes, e.g., using polynomial interpolation with un-
any number of existing ML or symbolic regression frame-                 derfitting to filter the noise.
works that standardize on ODE/PDE inputs. For example,                     Further details on directly and automatically mapping the
using non-orthogonal basis functions {1, x, x2 , sin x, cos x}          abstract (symbolic) I-net structures to discrete (cellular) and
to span both function spaces F1 , F2 , we can substitute for            numerical (tensor-based) I-net instance (e.g., computation
both symbolic functions:                                                graphs in PyTorch), learning scale-aware phenomenologi-
                                                                        cal relations, and physics-compatible discretization and de-
   f1 (θ) := c01 + c11 θ + c21 θ2 + c31 sin θ + c31 cos θ,        (4)   noising will be presented in a full paper.
   f2 (θ̇) := c02 + c12 θ̇ + c22 θ̇2 + c32 sin θ̇ + c32 cos θ̇,   (5)
into (3) to obtain a symbolic second-order (non)linear ODE
                                                                                  Real-World Scientific Discovery
in SymPy. Next, the software performs algebraic simpli-                 Figures 7 and 8 illustrate the application of our AI approach
fication to identify equivalence classes of hypotheses that,            to an elastodynamics challenge problem provided by AFRL
despite coming from different I-net structures, lead to the             in the course of the DARPA AI Research Associate (AIRA)
         Figure 7: The search DAG and a number of viable hypotheses to explain ultrasound wavefield in metal parts.


Figure 8: The AI associate discovers the (integral form) of the wave equation as well as the proper length/time scale at which
the heterogeneous material properties (in this case, speed of sound) must be defined.


program that supported the development of CyPhy. The in-                                 Conclusion
put is noisy data obtained by ultrasound imaging, measured
in (2+1)D spacetime over the surface of several material          Statistical learning methods, despite their accuracy and ef-
samples with heterogeneous properties.                            ficiency in narrow regimes for which they are carefully en-
                                                                  gineered, are not sufficient to independently acquire deep
   Figure 7 illustrates the search DAG along with a num-          understandings of the scientific problems they are applied
ber of I-net structures for viable hypotheses, each postulat-     to. Human scientists continue to handle most of knowledge-
ing the relevance of a conservation law and existence of a        centric aspects of the scientific process based on domain-
few phenomenological relations. Figure 8 shows the rank-          specific insight, experience, and expertise.
ing of these hypotheses based on their residual errors when           Our novel approach to early-stage scientific hypothesis
tested against data. Each hypothesis can be interpreted in        generation and testing demonstrates a path forward towards
differential, integral, or integro-differential forms. The re-    context-aware, generalizable, and interpretable AI for scien-
sults demonstrate that integral forms applied to wide spatial     tific discovery. Our AI associate (CyPhy) distinguishes be-
and temporal neighborhoods (of ∼ 25 grid elements along           tween non-negotiable mathematical truism, implied by the
each axis) with high-order polynomial underfitting (up to         relationship between measurement and presupposed space-
cubic in each coordinate), resulting in a length/time scale-      time topology, and phenomenological realities that are at the
aware definition of (nonlocal) phenomenological relations         mercy of empirical learning. Data-driven regression is tar-
as well as physics-compatible (i.e., mimetic) discretization      geted at the latter to enable distilling governing equations
and de-noising, are preferable to strictly local numerical        from sparse and noisy data, while providing deep insights
schemes such as finite difference discretization.                 into the mathematical foundations.
                     Acknowledgment                                     Miyanawala, T. P.; and Jaiman, R. K. 2017. An Efficient Deep
                                                                        Learning Technique for the Navier-Stokes Equations: Application
This material is based upon work supported by the De-                   to Unsteady Wake Flow Dynamics. arXiv Preprints ISSN 0264-
fense Advanced Research Projects Agency (DARPA) under                   6021. doi:10.1016/j.eswa.2008.08.077.
Agreement No. HR00111990029.
                                                                        Nautrup, H. P.; Metger, T.; Iten, R.; Jerbi, S.; Trenkwalder, L. M.;
                                                                        Wilming, H.; Briegel, H. J.; and Renner, R. 2020. Operationally
                          References                                    Meaningful Representations of Physical Systems in Neural Net-
Bott, R.; and Tu, L. W. 1982. Differential Forms in Algebraic           works. arXiv preprint arXiv:2001.00593 .
Topology. Springer Science & Business Media.                            Palha, A.; Rebelo, P. P.; Hiemstra, R.; Kreeft, J.; and Gerritsma,
Branin, F. H. 1966. The Algebraic-Topological Basis for Network         M. 2014. Physics-Compatible Discretization Techniques on Single
Analogies and the Vector Calculus. In Symposium on Generalized          and Dual Grids, with Application to the Poisson Equation of Vol-
Networks, 453–491. Polytechnic Institute of Brooklyn, NY.               ume Forms. Journal of Computational Physics 257: 1394–1422.
Breen, P. G.; Foley, C. N.; Boekholt, T.; and Zwart, S. P. 2020.        Raghu, M.; and Schmidt, E. 2020. A Survey of Deep Learning for
Newton versus the Machine: Solving the Chaotic Three-body Prob-         Scientific Discovery. arXiv preprint arXiv:2003.11755 .
lem Using Deep Neural Networks. Monthly Notices of the Royal            Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2019. Physics-
Astronomical Society 494(2): 2465–2470.                                 Informed Neural Networks: A Deep Learning Framework for Solv-
Butler, K. T.; Davies, D. W.; Cartwright, H.; Isayev, O.; and Walsh,    ing Forward and Inverse Problems Involving Nonlinear Partial Dif-
A. 2018. Machine Learning for Molecular and Materials Science.          ferential Equations. Journal of Computational Physics 378: 686–
Nature 559(7715): 547–555. doi:10.1038/s41586-018-0337-2.               707.
Cranmer, M. D.; Xu, R.; Battaglia, P.; and Ho, S. 2019. Learn-          Roth, J. P. 1955. An Application of Algebraic Topology to Nu-
ing Symbolic Physics with Graph Networks. arXiv preprint                merical Analysis: On the Existence of a Solution to the Network
arXiv:1909.05862 .                                                      Problem. Proceedings of the National Academy of Sciences 41(7):
                                                                        518–521.
Daw, A.; Thomas, R. Q.; Carey, C. C.; Read, J. S.; Appling, A. P.;
and Karpatne, A. 2020. Physics-Guided Architecture (PGA) of             Rudy, S. H.; Brunton, S. L.; Proctor, J. L.; and Kutz, J. N. 2017.
Neural Networks for Quantifying Uncertainty in Lake Temperature         Data-Driven Discovery of Partial Differential Equations. Science
Modeling. In Proceedings of the 2020 SIAM International Confer-         Advances 3(4): e1602614.
ence on Data Mining, 532–540. Society for Industrial and Applied        Sanchez-Gonzalez, A.; Godwin, J.; Pfaff, T.; Ying, R.; Leskovec,
Mathematics (SIAM).                                                     J.; and Battaglia, P. W. 2020. Learning to Simulate Complex
                                                                        Physics with Graph Networks. arXiv preprint arXiv:2002.09405
Frankel, T. 2011. The Geometry of Physics: An Introduction. Cam-
                                                                        .
bridge University Press.
                                                                        Schmidt, M.; and Lipson, H. 2009. Distilling Free-Form Natural
Hatcher, A. 2001. Algebraic Topology. Cornell University.               Laws from Experimental Data. Science 324(5923): 81–85.
Hirani, A. N. 2003. Discrete Exterior Calculus. Ph.D. dissertation,     Seo, S.; and Liu, Y. 2019. Differentiable Physics-Informed Graph
California Institute of Technology.                                     Networks. arXiv preprint arXiv:1902.02950 .
Iten, R.; Metger, T.; Wilming, H.; Del Rio, L.; and Renner, R. 2020.    Stevens, R.; Taylor, V.; Nichols, J.; Maccabe, A. B.; Yelick, K.;
Discovering Physical Concepts with Neural Networks. Physical            and Brown, D. 2020. AI for Science. Technical report, Argonne
Review Letters 124(1): 010508.                                          National Laboratory(ANL).
Kitano, H. 2016. Artificial Intelligence to Win the Nobel Prize and     Tonti, E. 2013. The Mathematical Structure of Classical and Rela-
Beyond: Creating the Engine for Scientific Discovery. AI Magazine       tivistic Physics: A General Classification Diagram. Modeling and
37(1): 39–49.                                                           Simulation in Science, Engineering and Technology. Birkhäuser.
Koren, B.; Abgrall, R.; Bochev, P.; Frank, J. E.; and Perot, B. 2014.   ISBN 9781461474210.
Physics-Compatible Numerical Methods. Journal of Computa-               Udrescu, S. M.; and Tegmark, M. 2020. AI Feynman: A Physics-
tional Physics 257(Part B): 1039–1039.                                  Inspired Method for Symbolic Regression. Science Advances
Kron, G. 1963. Diakoptics: The Piecewise Solution of Large-Scale        6(16): eaay2631.
Systems, volume 2. MacDonald.                                           Wang, R.; Kashinath, K.; Mustafa, M.; Albert, A.; and Yu, R. 2020.
Langley, P. 1998. The Computer-Aided Discovery of Scientific            Towards Physics-Informed Deep Learning for Turbulent Flow Pre-
Knowledge. In International Conference on Discovery Science,            diction. In Proceedings of the 26th ACM SIGKDD International
25–39. Springer.                                                        Conference on Knowledge Discovery & Data Mining, 1457–1466.
                                                                        Wei, Z.; and Chen, X. 2019. Physics-Inspired Convolutional Neu-
Lipnikov, K.; Manzini, G.; and Shashkov, M. 2014. Mimetic Finite
                                                                        ral Network for Solving Full-Wave Inverse Scattering Problems.
Difference Method. Journal of Computational Physics 257: 1163–
                                                                        IEEE Transactions on Antennas and Propagation 67(9): 6138–
1227.
                                                                        6148.
Mattiussi, C. 2000. The Finite Volume, Finite Element, and Fi-          Wu, T.; and Tegmark, M. 2019. Toward an AI Physicist for Unsu-
nite Difference Methods as Numerical Methods for Physical Field         pervised Learning. arXiv preprint arXiv:1810.10525 .
Problems. Advances in Imaging and Electron Physics 113: 1–147.
                                                                        Yang, K. K.; Wu, Z.; and Arnold, F. H. 2019. Machine-Learning-
Mehta, P.; Bukov, M.; Wang, C. H.; Day, A. G. R.; Richardson, C.;       Guided Directed Evolution for Protein Engineering. Nature Meth-
Fisher, C. K.; and Schwab, D. J. 2019. A High-Bias, Low-Variance        ods 16(8): 687–694. doi:10.1038/s41592-019-0496-6.
Introduction to Machine Learning for Physicists. Physics Reports
810: 1–124.