<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generalized Physics-Informed Learning Through Language-Wide Differentiable Programming</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chris Rackauckas</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alan Edelman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Keno Fischer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mike Innes</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elliot Saba</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viral B. Shah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Will Tebbutt</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Julia Computing</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Massachusetts Institute of Technology</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Cambridge 182 Memorial Dr</institution>
          ,
          <addr-line>Cambridge, MA 02142</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Maryland</institution>
          ,
          <addr-line>Baltimore</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Scientific computing is increasingly incorporating the advancements in machine learning to allow for data-driven physicsinformed modeling approaches. However, re-targeting existing scientific computing workloads to machine learning frameworks is both costly and limiting, as scientific simulations tend to use the full feature set of a general purpose programming language. In this manuscript we develop an infrastructure for incorporating deep learning into existing scientific computing code through Differentiable Programming (@P ). We describe a @P system that is able to take gradients of full Julia programs, making Automatic Differentiation a first class language feature and compatibility with deep learning pervasive. Our system utilizes the one-language nature of Julia package development to augment the existing package ecosystem with deep learning, supporting almost all language constructs (control flow, recursion, mutation, etc.) while generating highperformance code without requiring any user intervention or refactoring to stage computations. We showcase several examples of physics-informed learning which directly utilizes this extension to existing simulation code: neural surrogate models, machine learning on simulated quantum hardware, and data-driven stochastic dynamical model discovery with neural stochastic differential equations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        A casual practitioner might think that scientific
computing and machine learning are different scientific disciplines.
Modern machine learning has made its mark through
breakthroughs in neural networks. Their applicability towards
solving a large class of difficult problems in computer science has
led to the design of new hardware and software to process
extreme amounts of labelled training data, while simultaneously
deploying trained models in devices. Scientific computing,
in contrast, a discipline that is perhaps as old as computing
itself, tends to use a broader set of modelling techniques
arising out of the underlying physical phenomena. Compared to
the typical machine learning researcher, many computational
scientists works with smaller volumes of data but with more
computational complexity and range. However, recent results
like Physics-Informed Neural Networks (PINNs) suggest that
data-efficient machine learning for scientific applications can
be found at the intersection of the methods
        <xref ref-type="bibr" rid="ref41 ref45">(Raissi, Perdikaris,
and Karniadakis 2019)</xref>
        . While a major advance, this
technique requires re-implementing partial differential equation
simulation techniques, such as Runge-Kutta methods, in the
context of machine learning frameworks like TensorFlow.
Likewise, it would be impractical to require every scientific
simulation suite to re-target their extensive stiff differential
equation solver and numerical linear algebra stacks to
specific machine learning libraries for specific tasks. In order
to truly scale physics-informed learning to big science
applications, we see a need for efficiently incorporating neural
networks into existing scientific simulation suites.
      </p>
      <p>
        Differentiable Programming (@P ) has the potential to be
the lingua franca that can further unite the worlds of
scientific computing and machine learning. Here we present
a @P system which allows for embedding deep neural
networks into arbitrary existing scientific simulations, enabling
these packages to automatically build surrogates for
MLacceleration and learn missing functions from data. Previous
work has shown that differentiable programming systems
in domain specific languages for image processing can
allow for machine learning integration into the domain
applications in a programmer-friendly manner
        <xref ref-type="bibr" rid="ref32 ref33">(Li et al. 2018;
Li 2019)</xref>
        . Supporting multiple languages within a single @P
system causes an explosion in complexity, vastly increasing
the developer effort required, which excludes languages in
which packages are developed in alternative languages like
Cython or Rcpp. For this reason we extend the full Julia
programming language
        <xref ref-type="bibr" rid="ref8">(Bezanson et al. 2017)</xref>
        with
differentiable programming capabilities in a way that allows existing
package to incorporate deep learning. By choosing the Julia
language, we arrive at an abundance of pure-Julia packages
for both machine learning and scientific computing with both
speed and automatic compatibility, allowing us to test our
ideas on fairly large real-world applications.
      </p>
      <p>Our system can be directly used on existing Julia
packages, handling user-defined types, general control flow
constructs, and plentiful scalar operations through
source-tosource mixed-mode automatic differentiation, composing the
reverse-mode capabilities of Zygote.jl and Tracker.jl with
ForwardDiff.jl for the "best of both worlds" approach. In this
paper we briefly describe how we achieve our goals for a
@P system and showcase its ability to solve problems which
mix machine learning and pre-existing scientific simulation
packages.</p>
      <sec id="sec-1-1">
        <title>A simple sin example: Differentiate Programs not</title>
      </sec>
      <sec id="sec-1-2">
        <title>Formulas</title>
        <p>We start out with a very simple example to differentiate sin(x)
written as a program through its Taylor series:
sin x = x
x3
+
x5</p>
        <p>: : : :
3! 5!</p>
        <p>Note that the number of terms will not be fixed, but will
depend on x through a numerical convergence criterion.</p>
        <p>To run, install Julia v1.1 or higher, and install the Zygote.jl
and ForwardDiff.jl packages with:
using Pkg
Pkg.add("Zygote")
Pkg.add("ForwardDiff")
using Zygote, ForwardDiff
function s(x)
t = 0.0
sign = -1.0
for i in 1:19
if isodd(i)
newterm = x^i/factorial(i)
abs(newterm)&lt;1e-8 &amp;&amp; return t
sign = -sign
t += sign * newterm
end
end
return t
end</p>
        <p>While the Taylor series for sine could have been written
more compactly in Julia, for purposes of illustrating more
complex programs, we purposefully used a loop, a
conditional, and function calls to isodd and factorial, which
are native Julia implementations. AD just works, and that is
the powerful part of the Julia approach. Let’s compute the
gradient at x = 1.0 and check whether it matches cos(1.0):
julia&gt; ForwardDiff.derivative(s, 1.0)
0.540302303791887
julia&gt; Zygote.gradient(s, 1.0)
(0.5403023037918872,)
julia&gt; cos(1.0)
0.5403023058681398</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Implementation</title>
      <p>
        Recent progress in tooling for automatic differentiation (AD)
has been driven primarily by the machine learning
community. Many state of the art reverse-mode AD tools such as
Tracker.jl
        <xref ref-type="bibr" rid="ref17 ref27 ref29">(Innes et al. 2018; Gandhi et al. 2019)</xref>
        , PyTorch
(PyTorch Team
        <xref ref-type="bibr" rid="ref12 ref5">2018), JAX (Johnson et al. 2018</xref>
        ), and
TensorFlow
        <xref ref-type="bibr" rid="ref1">(Abadi et al. 2016)</xref>
        (in the recent Eager version) employ
tracing methods to extract simplified program representations
that are more easily amenable to AD transforms. These traces
evaluate derivatives only at specific points in the program
space. Unfortunately, this generally unrolls control flow (i.e.
building a tape that requires O(n) memory instead of
keeping the loop construct intact) and requires compilation and
optimization for every new input.
      </p>
      <p>
        This choice has been driven largely by the fact that, as
the JAX authors put it, “ML workloads often consist of
large, accelerable, pure-and-statically-composed (PSC)
operations”
        <xref ref-type="bibr" rid="ref30">(Johnson et al. 2018)</xref>
        . Indeed, for many ML models
the per-executed-operation overhead (in both time and
memory) incurred by tracing-based AD systems is immaterial,
because these execution time and memory requirements of
the operations dwarf any AD overhead.
      </p>
      <p>However, this assumption does not hold for many scientific
inverse problems, or even the cutting edge of ML research.
Instead, these problems require a @P system capable of:
1. Low overhead, independent of the size of the executed
operation</p>
      <sec id="sec-2-1">
        <title>2. Efficient support for control flow</title>
        <p>3. Complete, efficient support for user defined data types</p>
      </sec>
      <sec id="sec-2-2">
        <title>4. Customizability</title>
      </sec>
      <sec id="sec-2-3">
        <title>6. Dynamism.</title>
        <p>5. Composability with existing code unaware of @P</p>
        <p>Particularly, scientific programs tend to have adaptive
algorithms, whose control flow depends on error estimates
and thus the current state of the simulation, numerous scalar
operations, define large nonlinear models using every term
individually or implementing specialized numerical linear
algebra routines, and pervasive use of user-defined data
structures to describe model components, which require efficient
memory handling (stack-allocation) in order for the problem
to be computationally feasible.</p>
        <p>
          To take these kinds of problems, Zygote does not utilize
the standard methodology and instead generates a derivative
function directly from the original source which is able to
handle all input values. This is called a source-to-source
transformation, a methodology with a long history (Baydin et al.
2018) going back at least to the ADIFOR source-to-source
AD program for FORTRAN 77
          <xref ref-type="bibr" rid="ref10">(Bischof et al. 1996)</xref>
          . Using
this source-to-source formulation, Zygote can then be
compile, heavily optimize, and re-use a single gradient definition
for all input values. Significantly, this transformation keeps
control flow in tact: not unrolling loops to allow for all
possible branches in a memory-efficient form. However, where
prior source-to-source AD work has often focused on static
languages, Zygote expands upon this idea by supporting a
full high level language, dynamic, Julia, in a way that
allows for its existing scientific and machine learning package
ecosystem to benefit from this tool.
        </p>
        <sec id="sec-2-3-1">
          <title>Generality, Flexibility, and Composability</title>
          <p>One of the primary design decisions of a @P system is how
these capabilities should be exposed to the user. One
convenient way to do so is using a differential operator J that
function J(f . g)(x)
a, da = J(f)(x)
b, db = J(g)(a)
b, z -&gt; da(db(z))
end
operates on first class functions and once again returns a
first class function (by returning a function we automatically
obtain higher order derivatives, through repeated application
of J ). There are several valid choices for this differential
operator, but a convenient choice is</p>
          <p>J (f ) := x ! (f (x); Jf (x)z);
i.e. J (f )(x) returns the value of f at x, as well as a
function which evaluates the jacobian-vector product between
Jf (x) and some vector of sensitivities z. From this primitive
we can define the gradient of a scalar function g : Rn ! R
which is written as:</p>
          <p>rg(x) := [J (g)(x)]2 (1)
([]2 selects the second value of the tuple, 1 = @z=@z is the
initial sensitivity).</p>
          <p>This choice of differential operator is convenient for
several reasons: (1) The computation of the forward pass often
computes values that can be re-used for the computation of
the backwards pass. By combining the two operations, it is
easy to re-use this work. (2) It can be used to recursively
implement the chain rule (see figure 1).</p>
          <p>
            This second property also suggests the implementation
strategy: hard code the operation of J on a set of
primitive f ’s and let the AD system generate the rest by repeated
application of the chain rule transform. This same general
approach has been implemented in many systems
            <xref ref-type="bibr" rid="ref36 ref47">(Pearlmutter and Siskind 2008; Wang et al. 2018)</xref>
            and a detailed
description of how to perform this on Julia’s SSA form IR is
available in earlier work
            <xref ref-type="bibr" rid="ref27 ref29">(Innes 2018)</xref>
            .
          </p>
          <p>However, to achieve our extensibility and composability
goals, we implement a slight twist on this scheme. We define
a fully user extensible function @ that provides a default
fallback as follows</p>
          <p>
            @(f )(args:::) = J (f )(args:::);
where the implementation that is generated automatically
by J recurses to @ rather than J and can thus easily be
intercepted using Julia’s standard multiple dispatch system
at any level of the stack. For example, we might make the
following definitions:
@(f )(:: typeof(+))(a :: IntOrFloat;b :: IntOrFloat) =
@(f )(:: typeof( ))(a :: IntOrFloat;b :: IntOrFloat) =
a + b; z ! (z; z)
a b; z ! (z b; a z)
i.e. declaring how to compute the partial derivative of +
and for two integer or float-valued numbers, but
simultaneously leaving unconstrained the same for other functions or
other types of values (which will thus fall back to applying
the AD transform). With these two definitions, any program
that is ultimately just a composition of ‘+‘, and ‘*‘ operations
of real numbers will work. We show a simple example in
figure 2. Here, we used the user-defined Measurement type
from the Measurements.jl package
            <xref ref-type="bibr" rid="ref21">(Giordano 2016)</xref>
            . We did
not have to define how to differentiate the ^ function or how
to differentiate + and on a Measurement, nor did the
Measurements.jl package have to be aware of the AD system in
order to be differentiated. Thus standard user definitions of
types are compatible with the differentiation system. This
extra, user-extensible layer of indirection has a number of
important consequences:
          </p>
        </sec>
        <sec id="sec-2-3-2">
          <title>The AD system does not depend on, nor require any</title>
          <p>
            knowledge of primitives on new types. By default we
provide implementations of the differentiable operator for
many common scalar mathematical and linear algebra
operations, written with a scalar LLVM backend and
BLASlike linear algebra operations. This means that even when
Julia builds an array type to target TPUs
            <xref ref-type="bibr" rid="ref16 ref27">(Fischer and Saba
2018)</xref>
            , its XLA IR primitives are able to be used and
differentiated without fundamental modifications to our system.
          </p>
        </sec>
        <sec id="sec-2-3-3">
          <title>Custom gradients become trivial. Since all operations</title>
          <p>indirect through @, there is no difference between
userdefined custom gradients and those provided by the
system. They are written using the same mechanism, are
cooptimized by the compiler and can be finely targeted using
Julia’s multiple dispatch mechanism.</p>
          <p>Since Julia solves the two language problem, its Base,
standard library, and package ecosystem are almost entirely pure
Julia. Thus, since our @P system does not require primitives
to handle new types, this means that almost all functions
and types defined throughout the language are automatically
supported by Zygote, and users can easily accelerate specific
functions as they deem necessary.</p>
        </sec>
        <sec id="sec-2-3-4">
          <title>Surrogates for Realtime ML-Acceleration of</title>
        </sec>
        <sec id="sec-2-3-5">
          <title>Inverse Problems</title>
          <p>
            Model-based reinforcement learning has advantages over
model-agnostic methods, given that an effective agent must
approximate the dynamics of its environment
            <xref ref-type="bibr" rid="ref3">(Atkeson and
Santamaria 1997)</xref>
            . However, model-based approaches have
been hindered by the inability to incorporate realistic
environmental models into deep learning models. Previous
work has had success re-implementing physics engines
using machine learning frameworks
            <xref ref-type="bibr" rid="ref13 ref14">(Degrave et al. 2019;
de Avila Belbute-Peres et al. 2018)</xref>
            , but this effort has a
large engineering cost, has limitations compared to existing
engines, and has limited applicability to other domains such
as biology or meteorology. For example, one notable example
has shown that solving the 3-body problem can be
accelerated 100 million times by using a neural surrogate approach,
but required writing a simulator that was compatible with the
chosen neural network framework (Breen et al. 2019).
          </p>
          <p>
            Zygote can be used for control problems, incorporating the
model into backpropagation with one call to gradient. We
pick trebuchet dynamics as a motivating example. Instead of
aiming at a single target, we optimize a neural surrogate that
can aim it given any target. The neural net takes two inputs,
the target distance in metres and the current wind speed. The
network outputs trebuchet settings (the mass of the
counterweight and the angle of release) that get fed into a simulator
which solves an ODE and calculates the achieved distance.
We compare to our target and backpropagate through the
entire chain to adjust the weights of the network. Our dataset is
a randomly chosen set of targets and wind speeds. An agent
that aims a trebuchet to a given target can thus be trained in a
few minutes on a laptop CPU, resulting in a network which
is a surrogate to the inverse problem that allows for aiming
in constant-time. Given that the forward-pass of this network
is about 100 faster than performing the full optimization
on the trebuchet system (Figure 3), this surrogate technique
gives the ability to decouple real-time model-based control
from the simulation cost through a pre-trained neural
network surrogate to the inverse. We present the code for this
and other common reinforcement learning examples such as
the cartpole and inverted pendulum
            <xref ref-type="bibr" rid="ref17 ref28 ref41 ref44">(Innes, Joy, and Karmali
2019)</xref>
            .
          </p>
        </sec>
        <sec id="sec-2-3-6">
          <title>Quantum Machine Learning</title>
          <p>
            On the one hand, a promising potential application and
research direction for Noisy Intermediate-Scale Quantum
(NISQ) technology
            <xref ref-type="bibr" rid="ref37">(Preskill 2018)</xref>
            is variational quantum
circuits
            <xref ref-type="bibr" rid="ref41 ref7">(Benedetti, Lloyd, and Sack 2019)</xref>
            , where a quantum
circuit is parameterized by classical parameters in quantum
gates, which may have less requirements on the hardware.
Here, in many cases, the classical parameters are optimized
with classical gradient-based algorithms. On the other hand,
designing quantum circuits is hard, however, it is
potentially an interesting direction to explore using gradient-based
method to search circuit architecture for certain task with AD
support.
          </p>
          <p>
            One such state of the art simulator is the Yao.jl
            <xref ref-type="bibr" rid="ref48">(zhe Luo et
al. 2019)</xref>
            quantum simulator project. Yao.jl is implemented
in Julia and thus composable with our AD technology. There
are a number of interesting applications of this combination
            <xref ref-type="bibr" rid="ref34">(Mitarai et al. 2018)</xref>
            .
          </p>
          <p>
            A subtle application is to perform traditional AD of the
quantum simulator itself. As a simple example of this
capability, we consider a Variational Quantum Eigensolver (VQE)
            <xref ref-type="bibr" rid="ref31">(Kandala et al. 2017)</xref>
            . A variational quantum eigensolver is
used to compute the eigenvalue of some matrix H
(generally the Hamiltonian of some quantum system for which the
eigenvalue problem is hard to solve classically, but that is
easily encoded into quantum hardware). This is done by using
a variational quantum circuit ( ) to prepare some
quantum state j i = ( )j0i, measuring the expectation value
h jHj 0i and then using a classical optimizer to adjust to
minimize the measured value. In our example, we will use a
4site toy Hamiltonian corresponding to an anti-ferromagnetic
Heisenberg chain:
          </p>
          <p>1
H = 4 4
2</p>
          <p>X
hi;ji</p>
          <p>3
ix jx +
iy jy + i j 5
z z</p>
          <p>We use a standard differentiable variational quantum
circuit composed of layers (2 in our example) of (parameterized)
rotations and CNOT entanglers with randomly initialized
rotation angles. Figure 4 shows the result of the minimization
process as performed using gradients provided by Zygote.</p>
        </sec>
        <sec id="sec-2-3-7">
          <title>Data-Driven Stochastic Dynamical Model</title>
        </sec>
        <sec id="sec-2-3-8">
          <title>Discovery with Neural Stochastic Differential</title>
        </sec>
        <sec id="sec-2-3-9">
          <title>Equations</title>
          <p>
            Neural latent differential equations
            <xref ref-type="bibr" rid="ref22 ref25 ref41 ref44 ref49">(Chen et al. 2018;
Álvarez, Luengo, and Lawrence 2009; Hu et al. 2013;
Rackauckas et al. 2019)</xref>
            incorporate a neural network into
the ODE derivative function. Recent results have shown that
many deep learning architectures can be compacted and
generalized through neural ODE descriptions
            <xref ref-type="bibr" rid="ref15 ref22 ref22 ref24 ref41">(Chen et al. 2018;
He et al. 2016; Dupont, Doucet, and Whye Teh 2019;
Grathwohl et al. 2018)</xref>
            . Latent differential equations have also seen
use in time series extrapolation
            <xref ref-type="bibr" rid="ref19">(Gao et al. 2008)</xref>
            and model
reduction
            <xref ref-type="bibr" rid="ref23 ref35 ref36 ref4 ref41 ref42 ref46">(Ugalde et al. 2013; Hartman and Mestha 2017;
Bar-Sinai et al. 2018; Ordaz-Hernandez, Fischer, and Bennis
2008)</xref>
            .
          </p>
          <p>Here we demonstrate automatic construction of Langevin
equations from data using neural stochastic differential
equations (SDE). Typically, Langevin equation models can be
written in the form:</p>
          <p>dXt = f (Xt)dt + g(Xt)dWt;
where f : Rn ! Rn and g : Rn m ! Rn with Wt as the
m-dimensional Wiener process.</p>
          <p>
            To automatically learn such a physical model, we can
replace the drift function f and the diffusion function g
with neural networks and train these against time series
data by repeatedly solving the differential equation 10
times and using the average gradient of the cost function
against the parameters of the neural network. Our simulator
utilizes a high strong order adaptive integration provided
by DifferentialEquations.jl
            <xref ref-type="bibr" rid="ref23 ref31 ref41 ref42 ref8">(Rackauckas and Nie 2017a;
2017b)</xref>
            . Figure 5 depicts a two-dimensional neural SDE
trained using the l2 normed distance between the solution
and the data points. Included is a forecast of the neural SDE
solution beyond the data to a final time of 1.2, showcasing a
potential use case for timeseries extrapolation from a learned
dynamical model.
          </p>
          <p>
            The analytical formula for the adjoint of the strong solution
of a SDE is difficult to efficiently calculate due to the lack of
classical differentiability of the solution. However, Zygote
still manages to calculate a useful derivative for
optimization with respect to single solutions by treating the Brownian
process as fixed and applying forward-mode automatic
differentiation, showcasing Zygote’s ability to efficiently optimize
its AD through mixed-mode approaches
            <xref ref-type="bibr" rid="ref43">(Rackauckas et al.
2018)</xref>
            . Common numerical techniques require computing
the gradient with respect to a difference over thousands of
trajectories to receive an average cost, while our numerical
experiments suggest that it is sufficient with Zygote to
perform gradient decent on a neural SDE using only single or
a few trajectories, reducing the overall computational cost
by this thousands. This methodological advance combined
with GPU-accelerated high order adaptive SDE integrators
in DifferentialEquations.jl makes a whole new field of study
          </p>
          <p>Acknowledgments This work would not have been
possible without the work of many people. We are particularly
indebted to our users that contributed the examples in this
paper. Thanks to Roger Luo and JinGuo Liu for the quantum
experiments. Thanks also to Zenna Tavares, Jesse Bettencourt
and Lyndon White for being early adopters of Zygote and its
underlying technology, and shaping its development. Much
credit is also due to the core Julia language and compiler
team for supporting Zygote’s development, including
Jameson Nash, Jeff Bezanson and others. Thanks also to James
Bradbury for helpful discussions on this paper.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Abadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Barham</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Devin,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Ghemawat</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ; Irving,
          <string-name>
            <surname>G.</surname>
          </string-name>
          ; Isard,
          <string-name>
            <surname>M.</surname>
          </string-name>
          ; et al.
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>Tensorflow: A system for large-scale machine learning</article-title>
          .
          <source>In 12th fUSENIXg Symposium on Operating Systems Design and Implementation (fOSDIg 16)</source>
          ,
          <fpage>265</fpage>
          -
          <lpage>283</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Atkeson</surname>
            ,
            <given-names>C. G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Santamaria</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          <year>1997</year>
          .
          <article-title>A comparison of direct and model-based reinforcement learning</article-title>
          .
          <source>In Proceedings of International Conference on Robotics and Automation</source>
          , volume
          <volume>4</volume>
          ,
          <fpage>3557</fpage>
          -
          <lpage>3564</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Bar-Sinai</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hoyer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hickey</surname>
            , J.; and Brenner,
            <given-names>M. P.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Datadriven discretization: machine learning for coarse graining of partial differential equations</article-title>
          . arXiv e-prints arXiv:
          <year>1808</year>
          .04930.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          2018.
          <article-title>Automatic differentiation in machine learning: a survey.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>Journal of Marchine Learning Research</source>
          <volume>18</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Benedetti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lloyd</surname>
          </string-name>
          , E.; and
          <string-name>
            <surname>Sack</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Parameterized quantum circuits as machine learning models</article-title>
          . arXiv preprint arXiv:
          <year>1906</year>
          .07682.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Bezanson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Edelman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Karpinski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>V. B.</given-names>
          </string-name>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>Julia: A fresh approach to numerical computing</article-title>
          .
          <source>SIAM Review</source>
          <volume>59</volume>
          (
          <issue>1</issue>
          ):
          <fpage>65</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Bischof</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Khademi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mauer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Carle</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>1996</year>
          .
          <article-title>ADIFOR 2.0: Automatic differentiation of Fortran 77 programs</article-title>
          .
          <source>IEEE Computational Science and Engineering</source>
          <volume>3</volume>
          (
          <issue>3</issue>
          ):
          <fpage>18</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          2019.
          <article-title>Newton vs the machine: solving the chaotic three-body problem using deep neural networks</article-title>
          . arXiv e-prints arXiv:
          <year>1910</year>
          .07291.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          2018.
          <article-title>Neural ordinary differential equations</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          ,
          <volume>6571</volume>
          -
          <fpage>6583</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>de Avila Belbute-Peres</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Allen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tenenbaum</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Kolter</surname>
            ,
            <given-names>J. Z.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>End-to-end differentiable physics for learning and control</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          ,
          <volume>7178</volume>
          -
          <fpage>7189</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Degrave</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Hermans,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Dambre</surname>
          </string-name>
          , J.; and wyffels,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>A differentiable physics engine for deep learning in robotics</article-title>
          .
          <source>Frontiers in Neurorobotics 13.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Dupont</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Doucet</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <given-names>Whye</given-names>
            <surname>Teh</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Augmented Neural ODEs</article-title>
          . arXiv e-prints arXiv:
          <year>1904</year>
          .01681.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Saba</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Automatic full compilation of Julia programs and ML models to cloud TPUs</article-title>
          . CoRR abs/
          <year>1810</year>
          .09868.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Gandhi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Innes,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Saba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ;
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ; and
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Julia E Flux: Modernizando o Aprendizado de Máquina</surname>
          </string-name>
          .
          <volume>5</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Honkela</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Rattray</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and Lawrence,
          <string-name>
            <surname>N. D.</surname>
          </string-name>
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities</article-title>
          .
          <source>Bioinformatics</source>
          <volume>24</volume>
          (16):
          <fpage>i70</fpage>
          -
          <lpage>i75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Giordano</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Uncertainty propagation with functionally correlated quantities</article-title>
          .
          <source>ArXiv</source>
          e-prints.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Grathwohl</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Chen,
          <string-name>
            <given-names>R. T. Q.</given-names>
            ;
            <surname>Bettencourt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ;
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.;</given-names>
            and
            <surname>Duvenaud</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. K.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>FFJORD: free-form continuous dynamics for scalable reversible generative models</article-title>
          . CoRR abs/
          <year>1810</year>
          .01367.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Hartman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Mestha</surname>
            ,
            <given-names>L. K.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>A deep learning framework for model reduction of dynamical systems</article-title>
          .
          <source>In 2017 IEEE Conference on Control Technology and Applications (CCTA)</source>
          ,
          <year>1917</year>
          -
          <fpage>1922</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>2016 IEEE Conference on Computer Vision</source>
          and Pattern
          <string-name>
            <surname>Recognition</surname>
          </string-name>
          (CVPR)
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Boker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Neale,
          <string-name>
            <given-names>M.</given-names>
            ; and
            <surname>Klump</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <year>2013</year>
          .
          <article-title>Coupled latent differential equation with moderators: Simulation and application</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>Psychological methods 19.</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Innes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Saba</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gandhi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Rudilosso,
          <string-name>
            <given-names>M. C.</given-names>
            ;
            <surname>Joy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            ;
            <surname>Karmali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ; and
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Fashionable modelling with Flux</article-title>
          . CoRR abs/
          <year>1811</year>
          .01457.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Innes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Joy</surname>
            ,
            <given-names>N. M.</given-names>
          </string-name>
          ; and Karmali,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Reinforcement learning vs. differentiable programming</article-title>
          . https://fluxml.ai/
          <year>2019</year>
          /03/05/dp-vsrl.html.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>Innes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Don't unroll adjoint: Differentiating SSA-form programs</article-title>
          . CoRR abs/
          <year>1810</year>
          .07951.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , M.;
          <string-name>
            <surname>Frostig</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Maclaurin,
          <string-name>
            <given-names>D.</given-names>
            ; and
            <surname>Leary</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>JAX: Autograd and xla</article-title>
          . https://github.com/google/jax.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>Kandala</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mezzacapo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Temme</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Takita</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Brink</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chow</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Gambetta</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets</article-title>
          .
          <source>Nature</source>
          <volume>549</volume>
          (
          <issue>7671</issue>
          ):
          <fpage>242</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          -M.;
          <string-name>
            <surname>Gharbi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Adams</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Durand</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Ragan-Kelley</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Differentiable programming for image processing and deep learning in Halide</article-title>
          .
          <source>ACM Trans. Graph</source>
          .
          <source>(Proc. SIGGRAPH</source>
          )
          <volume>37</volume>
          (
          <issue>4</issue>
          ):
          <volume>139</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>139</lpage>
          :
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>T.-M.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Differentiable Visual Computing</article-title>
          . arXiv e-prints arXiv:
          <year>1904</year>
          .12228.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Mitarai</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Negoro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kitagawa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Fujii</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Quantum circuit learning</article-title>
          .
          <source>Physical Review A</source>
          <volume>98</volume>
          (
          <issue>3</issue>
          ):
          <fpage>032309</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <surname>Ordaz-Hernandez</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Bennis</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Model reduction technique for mechanical behaviour modelling: Efficiency criteria and validity domain assessment</article-title>
          .
          <source>Proceedings of the Institution of Mechanical Engineers</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>C</given-names>
          </string-name>
          :
          <source>Journal of Mechanical Engineering Science</source>
          <volume>222</volume>
          (
          <issue>3</issue>
          ):
          <fpage>493</fpage>
          -
          <lpage>505</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>Pearlmutter</surname>
            ,
            <given-names>B. A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Siskind</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator</article-title>
          .
          <source>ACM Transactions on Programming Languages and Systems (TOPLAS) 30</source>
          (
          <issue>2</issue>
          ):
          <fpage>7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>Preskill</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Quantum computing in the nisq era and beyond</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <source>Quantum</source>
          <volume>2</volume>
          :
          <fpage>79</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <string-name>
            <given-names>PyTorch</given-names>
            <surname>Team</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>The road to 1.0: production ready PyTorch</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          https://pytorch.org/blog/a-year-in/. Accessed:
          <fpage>2018</fpage>
          -09-22.
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <string-name>
            <surname>Rackauckas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <year>2017a</year>
          . Differentialequations.jl
          <article-title>- a performant and feature-rich ecosystem for solving differential equations in julia</article-title>
          .
          <volume>5</volume>
          (
          <issue>1</issue>
          ). Exported from https://app.dimensions.
          <source>ai on</source>
          <year>2019</year>
          /05/05.
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <string-name>
            <surname>Rackauckas</surname>
            ,
            <given-names>C. V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <year>2017b</year>
          .
          <article-title>Adaptive methods for stochastic differential equations via natural embeddings and rejection sampling with memory</article-title>
          .
          <source>Discrete and continuous dynamical systems. Series B 22</source>
          <volume>7</volume>
          :
          <fpage>2731</fpage>
          -
          <lpage>2761</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <string-name>
            <surname>Rackauckas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ; Ma, Y.;
          <string-name>
            <surname>Dixit</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Innes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Revels</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nyberg</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Ivaturi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions</article-title>
          . arXiv e-prints arXiv:
          <year>1812</year>
          .
          <year>01892</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <string-name>
            <surname>Rackauckas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Innes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; Ma, Y.;
          <string-name>
            <surname>Bettencourt</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>White</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Dixit</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <year>2019</year>
          . Diffeqflux.jl
          <article-title>- A julia library for neural differential equations</article-title>
          . CoRR abs/
          <year>1902</year>
          .02376.
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <string-name>
            <surname>Raissi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Perdikaris</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; and Karniadakis,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Physicsinformed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations</article-title>
          .
          <source>Journal of Computational Physics</source>
          <volume>378</volume>
          :
          <fpage>686</fpage>
          -
          <lpage>707</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <string-name>
            <surname>Ugalde</surname>
            ,
            <given-names>H. M. R.</given-names>
          </string-name>
          ; Carmona, J.-C.;
          <string-name>
            <surname>Alvarado</surname>
            ,
            <given-names>V. M.</given-names>
          </string-name>
          ; and ReyesReyes,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2013</year>
          .
          <article-title>Neural network design and model reduction approach for black box nonlinear system identification with reduced number of parameters</article-title>
          .
          <source>Neurocomputing</source>
          <volume>101</volume>
          :
          <fpage>170</fpage>
          -
          <lpage>180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Essertel</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Decker</surname>
            , J.; and Rompf,
            <given-names>T.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Demystifying differentiable programming: Shift/reset the penultimate backpropagator</article-title>
          . arXiv preprint arXiv:
          <year>1803</year>
          .10228.
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <string-name>
            <surname>zhe Luo</surname>
          </string-name>
          , X.; guo Liu, J.; Zhang, P.; and
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Yao.jl: Extensible, efficient quantum algorithm design for humans</article-title>
          .
          <source>In preparation.</source>
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          <string-name>
            <surname>Álvarez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Luengo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; and Lawrence,
          <string-name>
            <surname>N. D.</surname>
          </string-name>
          <year>2009</year>
          .
          <article-title>Latent force models</article-title>
          . In van Dyk,
          <string-name>
            <given-names>D.</given-names>
            , and
            <surname>Welling</surname>
          </string-name>
          , M., eds.,
          <source>Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics</source>
          , volume
          <volume>5</volume>
          <source>of Proceedings of Machine Learning Research</source>
          ,
          <volume>9</volume>
          -
          <fpage>16</fpage>
          . Hilton Clearwater Beach Resort, Clearwater Beach,
          <string-name>
            <surname>Florida</surname>
            <given-names>USA</given-names>
          </string-name>
          : PMLR.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>