<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Learning Dynamical Systems across Environments</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Yuan</forename><surname>Yin</surname></persName>
							<email>yuan.yin@lip6.fr</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">Sorbonne Université</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<orgName type="institution" key="instit3">LIP6</orgName>
								<address>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ibrahim</forename><surname>Ayed</surname></persName>
							<email>ibrahim.ayed@lip6.fr</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">Sorbonne Université</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<orgName type="institution" key="instit3">LIP6</orgName>
								<address>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="laboratory">Theresis Lab</orgName>
								<orgName type="institution">Thales</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Emmanuel</forename><surname>De Bézenac</surname></persName>
							<email>emmanuel.de-bezenac@lip6.fr</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">Sorbonne Université</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<orgName type="institution" key="instit3">LIP6</orgName>
								<address>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Patrick</forename><surname>Gallinari</surname></persName>
							<email>patrick.gallinari@lip6.fr</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">Sorbonne Université</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<orgName type="institution" key="instit3">LIP6</orgName>
								<address>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">Criteo AI Lab</orgName>
								<address>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Learning Dynamical Systems across Environments</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">D1E8C7DF2D622D5126F4711B5DA000E6</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T20:21+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Learning the behavior of natural phenomena automatically from the data has gained much traction these last years. However, in most real world scenarios, the environment in which the data samples are acquired is varying and may not be the same for each data sample. This is due to different circumstances e.g. acquisition in different spatial locations, or simply experimental settings which slightly differ. This severely hinders the training process, and makes the standard learning framework inapplicable. In this work, we propose a novel framework for modeling physical systems in this context, where we are able to leverage the data across different environments in order to learn the underlying dynamical systems, ensuring generalization without compromising the model's expressiveness and predictive performance. We instantiate our framework on two different families of dynamical systems, proving that our approach yields superior results over the classical learning approach as well as against competitive baselines. Finally, we also show that we are also able to accelerate and improve the learning for environments that have never been seen before.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Often, natural phenomena may be difficult to understand and comprehend due to the complex and nonlinear interactions between composing elements, making it cumbersome to derive a mathematical model describing it. In this context, a data-driven approach arises as a powerful alternative to classical modeling methods, as an unknown model can be learned automatically from the data. Recently, much effort has been focused in this direction <ref type="bibr" target="#b7">(Giannakis and Majda 2012;</ref><ref type="bibr" target="#b11">Mangan et al. 2017)</ref>, with a particular emphasis on using neural networks <ref type="bibr" target="#b13">(Raissi, Perdikaris, and Karniadakis 2019;</ref><ref type="bibr" target="#b5">Chen et al. 2018;</ref><ref type="bibr" target="#b1">Ayed et al. 2019)</ref> for treating cases where the underlying processes are largely unknown. Despite promising results, these methods usually postulate an idealized setting where the data is abundant and the environment in which it is acquired is always the same. However, in practice, this is never the case as obtaining real world data samples may be expensive. Perhaps more importantly, the environment in which they are acquired may vary. These changes can be caused by different factors: For example, in climatic modeling, there are external forces such as the Coriolis force that varies in different spatial locations <ref type="bibr" target="#b10">(Madec et al. 2019)</ref>, or in cardiac computational model parameters need to be personalized for each patient <ref type="bibr">(Neic et al. 2017)</ref>.</p><p>The classical learning paradigm in this context is to treat all the data as independent and identically distributed, thus disregarding the discrepancies between the environments. As this assumption is not valid, it leads to a biased solution and results in an average model that performs poorly. Conversely, one may also choose to avoid making this assumption by splitting the data from different environments and learning one dynamical system per environment, separately. However, this ignores the similarities between environments and would severely affect generalization performance, specifically in settings where per-environment data is limited.</p><p>In this work, our goal is to take into account the difference between environments and make use of the similarities across them. Thus, we propose the LEarning Across Dynamical Systems (LEADS) framework, a novel learning methodology where the dynamics are decomposed into two components, one shared across all environments, and another that takes into account the dynamics that cannot be expressed by the shared component and only those. This allows us to leverage the data from similar environments automatically, without compromising the expressiveness of the model. We demonstrate the effectiveness of our framework on two standard examples of dynamics given by differential equations: the Lotka-Volterra predator-prey model, expressed as an ODE, and the Gray-Scott reaction-diffusion equations, expressed as PDEs. Finally, we also show that our method accelerates and improves learning for similar unseen environments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Approach Problem Setting</head><p>We consider the problem of learning unknown physical processes with data acquired from different environments. For each environment e ∈ E, we assume that the data is generated from an unknown governing differential equation:</p><formula xml:id="formula_0">dX t dt = f e (X t )<label>(1)</label></formula><p>defined over a finite time interval [0, T ] where the state X is either vector-valued, i.e. we have X t ∈ R d (Lotka-Volterra equations in the section Experiments) or is a d-dimensional vector field over a bounded spatial domain Ω ∈ R k , i.e. for t ∈ [0, T ] and x ∈ Ω, X t (x) ∈ R d . As stated above, modifications in the environment have an impact on dynamics of the system and thus the evolution terms f e are expected to be different. Nevertheless, we do assume they yield some form of similarity between environments: as we will see in the following, this is not a necessary condition for our framework to be applicable, but this is what will allow us to leverage the data from the other environments.</p><p>As in <ref type="bibr" target="#b0">Arjovsky et al. (2020)</ref>, we choose to not discard the information from where the data was collected. We construct our training set with training sample (e, {X e,i } i=1,...,Ne ) ∈ D. Each sample is thus composed of the environment identifier e as well as a set of trajectories where each X e,i , denoting here the i-th trajectory in the environment e, is a function verifying Equation <ref type="formula" target="#formula_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Related Work</head><p>To make the prediction performance invariant across environments, IRM <ref type="bibr" target="#b0">(Arjovsky et al. 2020</ref>) aims at finding a classifier that retains the correlations independent of different environments by excluding other spurious environmentrelated ones. However, in the context of dynamical systems, modeling bias in each environment is as important as modeling the invariant information, as both of them are indispensable for prediction. This makes IRM incompatible with our setting. <ref type="bibr" target="#b15">Spieckermann et al. (2015)</ref>; <ref type="bibr" target="#b4">Bird and Williams (2019)</ref> use RNNs conditioned on an environment code to perform biased learning in different environments. Nonetheless, the similarity between environments are not explicitly exploited as common invariant dynamical information.</p><p>In terms of robustness at test time, our formulation with common term is related to Multi-Task Learning (MTL) and Distributionally Robust optimization (DRO). <ref type="bibr" target="#b2">Baxter (2000)</ref> suggests that jointly learning related tasks in MLT can potentially result in better generalization than models learned individually from each task. DRO approaches such as <ref type="bibr" target="#b3">Bietti et al. (2019)</ref>; <ref type="bibr" target="#b16">Staib and Jegelka (2019)</ref> suggest that, in general loss minimization, imposing certain norm penalty on neural networks (or other models) can encourage better generalization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The Proposed Framework: LEADS</head><p>As the dynamical systems in equation ( <ref type="formula" target="#formula_0">1</ref>) are unknown, we will learn them from the data by parametrizing the evolution terms f e with neural networks as in <ref type="bibr" target="#b1">Ayed et al. (2019)</ref>; <ref type="bibr" target="#b5">Chen et al. (2018)</ref>. The problem now lies in how these terms will be instantiated. We consider decomposing the dynamics in two components one g ∈ F shared across environments, and another environment dependent component h e ∈ F, such that if F is large enough, their should exist a couple (g, h e ) ∈ F<ref type="foot" target="#foot_1">2</ref> such that by their sum, we recover the dynamics for environment e, i.e.</p><p>∀e ∈ E, f e = g + h e</p><p>(2)</p><p>The general idea here is that as g is the same for each environment it can be learned using all data points, across all environments. However, this decomposition yields a potentially infinite number of solutions, and in particular the trivial solution obtained by setting g to be the null function: in this case, data across environments cannot be leveraged.</p><p>In order to avoid the aforementioned trivial solution, we would like the shared function g to explain the dynamics as much as possible, and in turn make the environment dependent function h e be as small as possible. The following constrained optimization problem embeds this general idea:</p><formula xml:id="formula_1">min g,he∈F e h e 2 subject to ∀(e, X e,i t ) ∈ D, dX e,i t dt = (g + h e )(X e,i t )<label>(3)</label></formula><p>Let us consider the limit case where the dynamics are the same across environments, i.e. ∀e ∈ E, f e = f : this objective will then yield as solution the couple (g = f, h = 0), meaning that the common information, which is all there is, will entirely be captured by g as expected. This will benefit its generalization performance as all the data will be used, even those from different environments.</p><p>We will now instantiate our method, providing a practical implementation to solve the previous objective. In practice, we do not have access to the data trajectories at every instant t but only to a finite number of snapshots {X i k∆t } 0≤k≤ T /∆t at a temporal resolution ∆t. We consider the Lagrangian formulation of the proposed objective as our training loss. Instead of comparing the evolution terms as in Equation <ref type="formula" target="#formula_1">3</ref>, we directly compare the trajectories induced by these instead 12 :</p><formula xml:id="formula_2">L(g, h, λ) = e∈E 1 λ h e 2 + Ne i=1 K k=1 X e,i k∆t − Xe,i k∆t 2</formula><p>(4) where Xe,i k∆t = X e,i 0 + k∆t 0 (g + h e )( Xi s ) ds, which are the trajectory states starting from X e,i 0 solved by a DE solver with g + h e up to t = k∆t. Note that λ is treated as divisor under h e rather than a multiplier of the constraints. This is equivalent to optimize the original Lagrangian but more friendly with the gradient-descent-based methods when λ is very large. With an adequate algorithm in practice optimizing g, h and λ, we should arrive at the optimum g and h e when λ → +∞. However, solving such optimization problem is difficult as a varying λ changes constantly the loss surface, which makes the learning difficult in the context of dynamical system. We therefore treat the λ as a hyperparameter for each experiment, which should not affect the non-nullity of g even though in this case the constraints in Equation 3 will not be perfectly satisfied at optimum.</p><p>It is important to note that LEADS is actually independent of the choice of the function space F. We choose here neural networks for its expressiveness, in order to validate our framework. One can apply LEADS to any feasible function space expressed by other data-driven methods.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Method</head><formula xml:id="formula_3">L-V (#E = 4) L-V (#E = 10) G-S (#E =</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experiments</head><p>We conduct our experiments for two complex nonlinear dynamical systems. The first one is an ODE-driven biological dynamical system, and the second one is a PDE-driven reaction-diffusion model in which we find many complex behaviors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lotka-Volterra Equation</head><p>We consider this classical model <ref type="bibr" target="#b9">(Lotka 1926)</ref>, frequently used for describing the dynamics of interaction between a pair of predator and prey in an ecosystem. The dynamics follow the equations:</p><formula xml:id="formula_4">dx dt = αx − βxy, dy dt = δxy − γy</formula><p>where x, y are the quantity of the predator and the prey, α, β, γ, δ define how two species interact. In fact, by a proper rescaling one can absorb β and δ into x and y. We therefore leave β, δ constant by setting β = δ = 1 across all environments and let α, γ depend on the environments. The nonlinear interaction between two species are therefore non-environment component and the linear terms are linked to environments. We thus define the parameter θ e = (α e , γ e ) for each environment e. Note that choosing θ e determines consequently the second fixed point of the system (γ e , α e ), around which the trajectories orbit. The system state is X t = (x t , y t ). The initial conditions are fixed across the environments, i.e. ∀e, X e,i 0 = X i 0 . Starting from the same initial condition X i 0 = (x i 0 , y i 0 ), we simulate only 1 trajectory per environment for training and 32 for test. Note that the test set is much larger than the training one. The step size is ∆t = 0.5 and the dataset horizon is T = K∆t = 10. The experiments are conducted in 4 and 10 environments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Gray-Scott Equation</head><p>This reaction-diffusion model is famous for its Turing patterns and complex behaviors w.r.t its simplistic equation <ref type="bibr" target="#b12">(Pearson 1993)</ref>. The governing PDE is:</p><formula xml:id="formula_5">∂u ∂t = D u ∆u − uv 2 + F (1 − u) ∂v ∂t = D v ∆v + uv 2 − (F + k)v</formula><p>where X e t = (u e t , v e t ) is state in a given spatial domain Ω, with periodic boundary conditions. D u , D v denotes respectively the diffusion coefficient for u and v, which are constant <ref type="bibr" target="#b12">(Pearson 1993)</ref>. F and k together define the type of corresponding patterns and behaviors. This means that the diffusion and reaction terms are respectively non-environment and environment component.</p><p>We therefore choose parameters θ e = (F e , k e ) for each environment e to simulate data. Same as the Lotka-Volterra Equation, the initial conditions are shared across environments and we simulate one trajectory per environment for training and 32 trajectories for test. The step size is ∆t = 20 and the horizon is T = K∆t = 200. The experiments are conducted in 3 environments.</p><p>Training Details Within the experiments for each equation, functions g, h are NNs with the same architecture. We use 4-layer MLPs for Lotka-Volterra and 4-layer ConvNets for Gray-Scott. We apply Swish as the default activation function <ref type="bibr" target="#b14">(Ramachandran, Zoph, and Le 2017)</ref>. These networks are integrated in time using the differentiable solver implemented by <ref type="bibr" target="#b5">Chen et al. (2018)</ref>. The basic backpropagation through the internals of the solver is used instead. We apply an exponential Scheduled Sampling <ref type="bibr">(Lamb et al. 2016</ref>) with exponent at 0.99 to stabilize the training. We use across all experiments Adam optimizer <ref type="bibr" target="#b8">(Kingma and Ba 2015)</ref> with the same learning rate of 1 × 10 −3 and (β 1 , β 2 ) = (0.9, 0.999). For the operator norm acting on h e , we opt for max i,k h e (X e,i k∆t )</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2</head><p>/ X e,i k∆t 2</p><p>, where the X e,i correspond to training sample trajectories. In order for the estimation of the norm on the test data to not deviate , an upper bound on the associated Lipshitz constant, as suggested in <ref type="bibr" target="#b3">Bietti et al. (2019)</ref>.</p><p>Baselines We introduce following baselines to compare with the proposed formulation:</p><p>• Env. Indep.: the sum of two environment-independent neural networks g + h, learned with the standard ERM learning principle, as in <ref type="bibr" target="#b1">Ayed et al. (2019)</ref> <ref type="foot" target="#foot_2">3</ref> ,</p><p>• Env. Dep. Sum: the sum of two environment-dependent NNs g e + h e .</p><p>• LEADS no min.: our proposal without norm penalty, equivalent to LEADS with λ = +∞.</p><p>We show the results in Table <ref type="table" target="#tab_0">1</ref>. For Lotka-Volterra systems, we confirm at first that the entire dataset cannot be fit with a single pair of NNs (Env. Indep.). Comparing with other baselines, our method LEADS reduces nearly 4/5 of the test MSE by Env. Dep. Sum and 1/3 of the test MSE by LEADS no min. when there are #E = 4 environments. Figure <ref type="figure" target="#fig_0">1</ref> shows the samples of predicted trajectories in test, LEADS almost overlaps the ground true trajectory, while Env. Dep. Sum underperforms in most environments. When the number of environments is increased to #E = 10, the error cut is over 85% w.r.t Env. Dep. Sum and over 40% w.r.t LEADS no min.</p><p>We observe the same improving tendency for Gray-Scott systems. The error by LEADS is around 1/2 of Env. Dep. Sum test MSE and 60% of LEADS no min. test MSE. In Figure <ref type="figure" target="#fig_1">2(a)-(c</ref>), the states obtained with our method is qualitatively closer to the ground truth. With the help of the error maps in Figure <ref type="figure" target="#fig_1">2</ref>(d) and (e), we see that at the rightmost endtime frames, the errors are systematically reduced across all environments. This shows that LEADS accumulates less errors through the integration, which suggests that LEADS alleviates the overfitting on the support. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Learning in Unknown Environments</head><p>We demonstrate how the learned invariant dynamics can boost the fitting in new similar environments. We suppose now that we have an invariant function ĝ learned with LEADS from L-V (#E = 4). We then generate another Lotka-Volterra dataset in new environments E new , still 1 trajectory per environment in training set and 32 in test.</p><p>Let us consider the following adaptation strategies:</p><p>• No adapt.: a sanity check to ensure that the new dynamics cannot be predicted by ĝ without further adaptation.</p><p>• Env. Dep. Sum from scratch: the sum of two environment dependent NNs, trained from scratch with</p><p>• Env. Dep. Single from scratch: an environment dependent NN, trained from scratch, no boosting by ĝ.</p><p>• LEADS boosted Env. Dep. Single: train environment dependent NN h e boosted by learned ĝ.</p><p>Table <ref type="table" target="#tab_1">2</ref> contains the adaptation results at training iterations from 50 to 10000. With No adapt., we firstly show that ĝ alone is not able to predict in any of these new environments, even if they are closely related to the original ones. At the iteration 50, we observe that three last adaptations perform poorly as expected since they are at early stage of training. As soon as iteration 250, LEADS boosted Env. Dep. Single surpasses already the best performance of the training from scratch methods (Env. Dep. Sum and Env. Dep. Single from scratch) at iteration 10000. This clearly shows that the learned shared dynamics improves and accelerates the learning in new environments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion</head><p>We introduce a data-driven framework LEADS to learn dynamics from the data that is collected from a set of similar yet different dynamical systems. Demonstrated with two complex families of systems, our framework can significantly improve the test performance in every environment, especially when the number of available trajectories is limited. We finally show that the extracted dynamics by LEADS can boost the learning in similar new environments, which leads us towards a more flexible framework for prediction and generalization in new environments.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Comparison between test trajectories (blue) and ground truth (red), shown in phase space. Blue trajectories are predicted by (a) Env. Dep. Sum and (b) LEADS for Lotka-Volterra in 4 environments (env. 1 to 4 from left to right).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Comparison of trajectories from (a) Env. Dep. Sum and (b) LEADS with (c) the ground truth for Grey-Scott equation. Each row represents an environment. We show the state of channel u at t = 0, . . . , 5∆T . They are accompanied by the maps of prediction error at the rightmost timestep by (d) Env. Dep. Sum and (e) LEADS. The larger the error, the brighter the pixel at the corresponding coordinates.</figDesc><graphic coords="4,54.00,63.96,141.12,71.19" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Comparison between LEADS and baselines for Lotka-Volterra (in 4 and 10 envs.) and Gray-Scott equations (in 3 envs.).</figDesc><table><row><cell>3)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Comparison of different adaptation strategies in 2 new environments of Lotka-Volterra at different iterations.</figDesc><table><row><cell>Adaptation</cell><cell></cell><cell cols="2">MSE test at iteration</cell></row><row><cell></cell><cell>50</cell><cell>250</cell><cell>500</cell><cell>10000</cell></row><row><cell>No adapt.</cell><cell></cell><cell cols="2">-0.36 -</cell></row><row><cell>Env. Dep. Sum from scratch</cell><cell cols="2">0.23 5.02e-2</cell><cell>0.25</cell><cell>3.05e-3</cell></row><row><cell>Env. Dep. Single from scratch</cell><cell>1.65</cell><cell>18.3</cell><cell cols="2">8.87e-2 4.13e-3</cell></row><row><cell>LEADS boosted Env. Dep. Single</cell><cell cols="4">0.73 2.06e-3 1.84e-3 1.11e-3</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Note that both are equivalent when ∆t tends to 0.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Directly comparing the (approximate) evolution terms is possible using finite differences, but led to worse results.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">We have opted for the sum as it allows for a proper comparison with our method.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>This work was partially funded by Locust ANR-15-CE23-0027 and Chaires de recherche et d'enseignement en intelligence artificielle (Chaires IA), DL4Clim project (PG).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Arjovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gulrajani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lopez-Paz</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.02893</idno>
		<idno>ArXiv: 1907.02893</idno>
		<title level="m">Invariant Risk Minimization</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>cs, stat</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Learning Dynamical Systems from Partial Observations</title>
		<author>
			<persName><forename type="first">I</forename><surname>Ayed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>De Bézenac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pajot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Brajard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gallinari</surname></persName>
		</author>
		<idno>CoRR abs/1902.11136</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A Model of Inductive Bias Learning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Baxter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Artif. Int. Res</title>
		<idno type="ISSN">1076-9757</idno>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="149" to="198" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A Kernel Perspective for Regularizing Deep Neural Networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Bietti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Mialon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mairal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 36th International Conference on Machine Learning</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Chaudhuri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Salakhutdinov</surname></persName>
		</editor>
		<meeting>the 36th International Conference on Machine Learning<address><addrLine>Long Beach, California, USA</addrLine></address></meeting>
		<imprint>
			<publisher>PMLR</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">97</biblScope>
			<biblScope unit="page" from="664" to="674" />
		</imprint>
	</monogr>
	<note>Proceedings of Machine Learning Research</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Customizing Sequence Generation with Multi-Task Dynamical Systems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Bird</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">K I</forename><surname>Williams</surname></persName>
		</author>
		<idno>CoRR abs/1910.05026</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Neural Ordinary Differential Equations</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">T Q</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Rubanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bettencourt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">K</forename><surname>Duvenaud</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Bengio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Wallach</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Larochelle</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Grauman</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Cesa-Bianchi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Garnett</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="page" from="6571" to="6583" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Inc</forename><surname>Curran Associates</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Nonlinear Laplacian spectral analysis for time series with intermittency and low-frequency variability</title>
		<author>
			<persName><forename type="first">D</forename><surname>Giannakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Majda</surname></persName>
		</author>
		<idno type="DOI">10.1073/pnas.1118984109</idno>
		<ptr target="https://www.pnas.org/content/109/7/2222" />
	</analytic>
	<monogr>
		<title level="j">Proceedings of the National Academy of Sciences</title>
		<idno type="ISSN">0027-8424</idno>
		<imprint>
			<biblScope unit="volume">109</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page" from="2222" to="2227" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Adam: A Method for Stochastic Optimization</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1610.09038</idno>
		<idno>ArXiv: 1610.09038</idno>
	</analytic>
	<monogr>
		<title level="m">3rd International Conference on Learning Representations, ICLR 2015</title>
				<editor>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</editor>
		<meeting><address><addrLine>San Diego, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015-05-07">2015. May 7-9, 2015. 2016</date>
		</imprint>
	</monogr>
	<note>Professor Forcing: A New Algorithm for Training Recurrent Networks. cs, stat</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">ELEMENTS OF PHYSICAL BIOL-OGY</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Lotka</surname></persName>
		</author>
		<ptr target="http://www.jstor.org/stable/43430362" />
	</analytic>
	<monogr>
		<title level="j">Science Progress in the Twentieth Century</title>
		<idno type="ISSN">20594941</idno>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="issue">82</biblScope>
			<biblScope unit="page" from="341" to="343" />
			<date type="published" when="1919">1926. 1919. 1933</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Madec</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bourdallé-Badie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chanut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Clementi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Coward</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ethé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Iovino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lévy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lovato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Masson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mocavero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Rousset</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Storkey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vancoppenolle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Müeller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Nurser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Samson</surname></persName>
		</author>
		<title level="m">NEMO ocean engine. Add SI3 and TOP reference manuals</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Efficient computation of electrograms and ECGs in human whole heart simulations using a reaction-eikonal model</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">M</forename><surname>Mangan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">N</forename><surname>Kutz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">L</forename><surname>Brunton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Proctor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">O</forename><surname>Campos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Prassl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Niederer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Bishop</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Vigmond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Plank</surname></persName>
		</author>
		<idno type="DOI">10.1098/rspa.2017.0009</idno>
	</analytic>
	<monogr>
		<title level="j">Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences</title>
		<idno type="ISSN">0021-9991</idno>
		<imprint>
			<biblScope unit="volume">473</biblScope>
			<biblScope unit="page" from="191" to="211" />
			<date type="published" when="2017">2017. 2204. 2017</date>
		</imprint>
	</monogr>
	<note>Journal of Computational Physics</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Complex Patterns in a Simple System</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Pearson</surname></persName>
		</author>
		<idno type="DOI">10.1126/science.261.5118.189</idno>
	</analytic>
	<monogr>
		<title level="j">Science</title>
		<idno type="ISSN">0036-8075</idno>
		<imprint>
			<biblScope unit="volume">261</biblScope>
			<biblScope unit="page" from="189" to="192" />
			<date type="published" when="1993">1993. 5118</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations</title>
		<author>
			<persName><forename type="first">M</forename><surname>Raissi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Perdikaris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">E</forename><surname>Karniadakis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Computational Physics</title>
		<imprint>
			<biblScope unit="volume">378</biblScope>
			<biblScope unit="page" from="686" to="707" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Searching for Activation Functions</title>
		<author>
			<persName><forename type="first">P</forename><surname>Ramachandran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zoph</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<idno>CoRR abs/1710.05941</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Exploiting similarity in system identification tasks with recurrent neural networks</title>
		<author>
			<persName><forename type="first">S</forename><surname>Spieckermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Düll</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Udluft</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hentschel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Runkler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ESANN 2014 Industrial Data Processing and Analysis</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">169</biblScope>
			<biblScope unit="page" from="343" to="349" />
		</imprint>
	</monogr>
	<note>Learning for Visual Semantic Understanding in Big Data</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Distributionally robust optimization and generalization in kernel methods</title>
		<author>
			<persName><forename type="first">M</forename><surname>Staib</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jegelka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="9134" to="9144" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
