Dynamical Kinds and their Discovery Benjamin C. Jantzen Philosophy Dept. Virginia Tech Blacksburg, VA 24061 Abstract population, the learning problem is, roughly, to ascertain the set of direct, possibly stochastic causal relations amongst We demonstrate the possibility of classifying variables and, often, a model of the explicit functional form causal systems into kinds that share a common of those causal dependencies. Similarly, there is vast lit- structure without first constructing an explicit dy- erature on system identification that focuses on producing namical model or using prior knowledge of the a model of a given complex dynamical system that is suit- system dynamics. The algorithmic ability to de- able for prediction and control (Norton, 1986; Ljung, 2010). termine whether arbitrary systems are governed While these lines of research differ in terminology, methods, by causal relations of the same form offers sig- and emphasis – system identification tends to focus on deter- nificant practical applications in the development ministic systems, while the GCM formalism is designed for and validation of dynamical models. It is also representing stochastic causal relations – they both provide of theoretical interest as an essential stage in the methods for discovering causal structure. They also share scientific inference of laws from empirical data. an emphasis on the individual system – particular systems The algorithm presented is based on the dynam- of causal relations are grouped together into categories only ical symmetry approach to dynamical kinds. A after the fact, and generally without a consistent aim or dynamical symmetry with respect to time is an motivation. intervention on one or more variables of a system This exclusive focus on individual systems, however, leaves that commutes with the time evolution of the sys- out a significant element of scientific inference. Much sci- tem. A dynamical kind is a class of systems shar- entific work, especially in the early exploration of a new ing a set of dynamical symmetries. The algorithm phenomena, involves grouping distinct physical systems into presented classifies deterministic, time-dependent categories. These categories are then targets for law-like causal systems by directly comparing their ex- generalization. In other words, a primary goal of scientific hibited symmetries. Using simulated, noisy data research is the identification of laws of kinds of system, not from a variety of nonlinear systems, we show just models of one-off systems. So, for example, we seek that this algorithm correctly sorts systems into laws of chemical kinetics, not just models of the particu- dynamical kinds. It is robust under significant lar reaction in some specific beaker on a lab bench. We sampling error, is immune to violations of nor- seek laws of motion, not just a model of how this particular mality in sampling error, and fails gracefully with pendulum moves given a variable arm length. This sort increasing dynamical similarity. The algorithm of inference to kinds is not reflected in the causal learning we demonstrate is the first to address this aspect or system identification algorithms that have been devel- of automated scientific discovery. oped – both approaches to learning causal structure allow for classification only after model creation. But what if such classification could be done prior to learn- 1 The problem ing detailed models of the causal structure of individual systems? In that case, a number of inferential tasks would Recent decades have seen remarkable progress in addressing become substantially easier. For example, if one knew in the problem of learning causal structure from observational advance that multiple systems are governed by a common or experimental data. One especially rich approach deploys type of causal structure – whether representable by, say, a the framework of graphical causal models (GCMs) (Pearl, parametric class of differential equations or an equivalence 2000; Spirtes et al., 2000; Spirtes and Zhang, 2016). Given class of GCMs – then one could pool data from multiple a particular set of variables describing a specific system or systems in order to determine the form of the differential system or evolves the system and then applies σk . Note that equation or an equivalence class of GCMs governing sys- this is equivalent to the condition that dσk (xA (t))/dt = tems of that sort. Or consider the problem of model valida- rA σk (xA (t)). Even if these transformations were unknown tion. For sufficiently complex computer models, validation to us, there is a simple way to ascertain whether the σk for can be exceedingly difficult because the dynamical relations species A are different from the corresponding transforma- amongst high-level variables in the simulation are unknown tions, τ , that commute with the time evolution of species (Mattingly and Warwick, 2009). But if it were possible to B. To do so, we can read these transformations – at least ascertain directly – without an explicit model of the high- over a finite interval – directly off of our experimental data. level dynamics in either the simulation or the target system Suppose we make simultaneous measurements of each pop- – that the dynamics are of the same type, this would provide ulation at m times, t0 , t1 , . . . , tm . Suppose further that, for a tractable route to validation. Finally, one could assess each species, two experiments are conducted. In the first, whether or not a system with unknown dynamics exhibits an intervention sets the initial populations to xA (t0 ) and transitions between different dynamical regimes, such as the xB (t0 ) for species A and B, respectively, and in the sec- transition from laminar to turbulent flow in fluid dynamics. ond we set the initial populations to x̃A (t0 ) and x̃B (t0 ). In such transitions, one and the same system exhibits a dif- For each population, the resulting pair of trajectories must ferent dynamical form or causal structure at different times be connected by one of the special time-commuting trans- or in different circumstances. formations characteristic of that system. Empirical curves for σ and τ can thus be constructed by plotting x̃A (ti ) as For these reasons, a method for assessing the sameness a function of xA (ti ) and x̃B (ti ) as a function of xB (ti ). or difference of dynamical structure in the absence of any To decide whether the dynamics of the two systems have explicit dynamical model would be both practically and the same form it is only necessary to determine whether theoretically valuable. For such a direct classification to be σ(xA ) ≡ x̃A (xA ) = x̃B (xB ) ≡ τ (xB ). If species A really possible, two things are required. First, a clear and rigorous grows exponentially, then x̃A (xA ) = kxA , and detecting a definition of dynamical kind, and second, an empirical test difference in dynamics depends only on detecting deviation for sameness of kind. We suggest that the first component – from a straight line in the sample of x̃B (xB ). a rigorous definition – has been provided and well-motivated by the theoretical work of Jantzen (2014). To meet the sec- ond need, we present here both a test derived from this 2 Theoretical background and related work notion of dynamical kind and an explicit algorithm for au- tomating that test that is accurate and robust in the presence 2.1 Theory of dynamical kinds of significant measurement error in empirical data. The preceding example appealed to a feature of the dynam- ics of all exponential growth systems. That is, dynamical 1.1 Directly discerning dynamical difference: an systems were grouped into categories on the basis of high- example level features of their dynamics – in this case, the form of the governing differential equation – rather than features of To get a sense for how such a test might work, consider two specific trajectories or solutions of that equation. This is different species of bacteria, A and B, in dilute solutions of the essence of a dynamical kind – a class of causal systems nutrient-rich media. One might be interested in knowing that share some salient higher-order features though they whether the growth of both species is governed by the same may differ dramatically in the specific relations amongst dynamics under these conditions, as suggested by the “First variables. Law of Ecology” (Turchin, 2001) . Specifically, one might Of course, there are indefinitely many higher-order features be interested in whether both systems are well-represented of causal structure that could be used for classification in by differential equations of the same form, whatever that this way. We have chosen to focus on the variety of higher- form may be. The question can be approached directly by order properties that were used to sort dynamical kinds in considering the sorts of properties shared by two systems the bacterial growth example. Specifically, the notion of governed by equations of the same form. One such property dynamical kind we propose to exploit for purposes of auto- is the behavior of the system under intervention. More mated discovery is that proposed by Jantzen (2014). The specifically, for any given dynamics there is a special class motivation is two-fold. First, Jantzen’s proposal picks out of transformations of the system state that commute with classes of causal system that, if identified in the absence the evolution of that system state through time. of detailed individual models, would provide the sort of To be concrete, suppose species A grows exponentially, practical benefits for data-pooling, validation, and regime- such that dxA (t)/dt = rA xA (t). In that case, all and only detection described above. Second, as Jantzen argues in scaling transformations of the form σk (xA (t)) = kxA (t) detail, the classes carved out in this way coincide well with commute with the time-evolution of the system so that it existing scientific practice. Thus, automating the discov- doesn’t matter whether one applies σk and then evolves the ery of dynamical kinds in Jantzen’s sense would go a long way toward filling a gap in automated scientific inference, way, it’s clear that most symmetries of interest to physicists namely the absence of algorithmic methods for determining are dynamical symmetries with respect to time. Assuming classes of system appropriate for framing laws. that the dynamics is deterministic and well-described by a system of ordinary differential equations in time, we can In Jantzen’s view, what makes the dynamics of two distinct also restate the condition in terms of flows (Logemann and systems members of the same kind is a higher-order prop- Ryan, 2014). These are mappings φ : X × R → X where erty that they share: the structured set of transformations of X is the space of sates of the system at times represented each variable that commute with incrementation of another by the reals and under the conditions that φ(x, 0) = 0 and variable. These transformations are called ‘dynamical sym- φ(φ(x, t), s) = φ(x, t + s). Dynamical symmetries in the metries’ and are defined as follows (Jantzen, 2014, p3633): framework correspond to functions σ such that, for all x Definition 1 (Dynamical symmetry). Let V be a set of and t: variables. Let σ be an intervention on the variables in Int ⊂ V . The transformation σ is a dynamical symmetry with respect to some index variable X ∈ V −Int if and only φ(σ(x), t) = σ(φ(x, t)) (1) if σ has the following property: for all xi and xf , the final state of the system is the same whether σ is applied when Whatever formalism is appropriate for expressing dynamical X = xi and then an intervention on X makes it such that symmetries in a particular application, it is not necessarily X = xf , or the intervention on X is applied first, changing true that all functions satisfying a given formal condition its value from xi to xf , and then σ is applied. are proper dynamical symmetries. The latter are defined in terms of interventions on variables – interactions with Notice that this definition is substantive rather than mathe- a system that change its measured quantities in a way that matical in that it appeals to physical interactions of a certain is at least physically possible and which does not destroy sort, namely interventions in the GCM sense. As such, this the causal connections amongst them. Some formal trans- definition of dynamical symmetry is not tied to any partic- formations may represent unrealizable interventions such ular formalism or application. The notion of commuting as, perhaps, time reversal. For deterministic dynamics rep- interventions is certainly at home in the GCM framework, resented by a system of ODEs in time, every dynamical but it is also easily translated into the language of differen- symmetry can be represented by a mapping satisfying the tial equations as the commutation of system evolution and condition expressed by Equation 1. But if we take Jantzen’s the manipulation of initial or boundary conditions. It is the definition seriously, then it may not be the case that every latter framework we focus on in the remainder of this paper. such σ represents a dynamical symmetry. However, since In particular, we will be concerned with classes of system it is impossible to give a general account of what these ex- that are well-described by differential equations. In fact, ceptions are, we will make the idealizing assumption that, we’ll further restrict attention to systems with explicit time for the varieties of system under analysis here, there is a dependence. In this special case of temporal dynamics, time one-to-one correspondence between dynamical symmetries is a uniquely natural “index variable” – in a sense, one can’t and functions satisfying Equation 1. help but “intervene” to increment time in any given experi- ment. Thus, we are primarily interested in interventions on In Jantzen’s sense, a bare set of dynamical symmetries is subsets of variables which are dynamical symmetries with insufficient to characterize a dynamical kind. Further dis- respect to time (Jantzen, 2014, p3632): tinctions can be made if we note that for any given system of causally connected variables, the corresponding set of Definition 2 (Dynamical symmetry with respect to time). dynamical symmetries exhibits structure under composition. Let t be the variable representing time, and let V be a set In other words, one dynamical symmetry applied after an- of additional dynamical variables such that t ∈ / V . Let other is also a dynamical symmetry, and the equivalence σ be an intervention on the variables in Int ⊂ V . The relations amongst such compositions constitute a symmetry transformation σ is a dynamical symmetry with respect to structure (Jantzen, 2014, p3634-3635): time if and only if for all intervals ∆t, the final state of the system is the same whether σ is applied at some time t0 and Definition 3 (Nontrivial symmetry structure). The sym- the system evolved until t0 + ∆t, or the system first allowed metry structure of a collection of dynamical symmetries, to evolve from t0 to t0 + ∆t and then σ is applied. Σ = {σi |i = 1, 2, . . . } is given by the composition function ◦ : Σ × Σ → Σ. A nontrivial symmetry structure is one that To put this in terms familiar from mathematical physics contains the identity (i.e., “do nothing”) transformation and, (Rosen, 1995, 2008), let Λt,s (x) be the propagator for a for at least one variable Y relative to some index variable given system, i.e., the function Λt,s : X → X for which X, is not isomorphic to the group of mappings from Y to Λt,s (x) is the state of the system at time t given that x ∈ X itself (i.e., the set of all transformations of Y ). is the state at time s. A function σ : X → X represents a dynamical symmetry with respect to time if and only if for Jantzen identifies these nontrivial symmetry structures with every s, t > s, and x, Λt,s (σ(x)) = σ(Λt,s (x)). Put this the dynamical kinds of interest (2014, p3635): Definition 4 (Dynamical kind). A dynamical kind is a class of systems of variables that share a set of dynamical symme- t v0 v1 t v0 v1     tries that are related by a non-trivial symmetry structure. t0 a00 a01 t0 b00 b01  t1 a10 a11    t1 b10 b11      t2 a20 a21    t2 b20 b21   2.2 Related work .. .. .. .. .. ..   . . . . . . As suggested above, Jantzen’s theory of dynamical kinds is w  closely connected to the axiomatic treatment of causation that underwrites the increasingly diverse and powerful set  v0 v1 ṽ 0   v0 v1 ṽ 1  of causal discovery algorithms. In particular, Jantzen shows * a00 a01 b00 a00 a01 b01 + that a system of variables possesses a nontrivial symmetry  a10 a11 b10   a10 a11 b11  ,     structure S (one that is not isomorphic to the set of automor-  a20 a21 b20   a20 a21 b21  .. .. .. .. .. ..     phisms on the states of S) if and only if it has a non-trivial causal structure (one for which at least one variable is a di- . . . . . . rect cause of another)(2014, p3635-3636). This is the sense in which dynamical symmetries are high-level properties Figure 1: Schematic showing how samples from two time- of the causal structure of a system. There is also precedent series are restructured to obtain an implicit model of the for Jantzen’s appeal to symmetries in the machine learning dynamical symmetry that maps one trajectory into the other. literature. Schmidt and Lipson (2009) developed an auto- mated discovery algorithm that learns free-form laws by seeking invariants of the motion – functions of the dynam- ical variables whose values remain constant through time. result, for each system, is a sample of p vector functions, As Noether (Noether and Tavel, 1971) demonstrated, the each with n components. In the second phase, these sam- presence of symmetries is intimately connected with the ples are transformed in order to serve as samples of the presence of such invariants. But to the best of our knowl- symmetry transformations to be compared. These dynam- edge, there have been no prior attempts to use empirical ical symmetries correspond to the p − 1 vector functions, ˜j (x~1 (t)) which map the initial curve ~x1 (t) into each of the ~x measurements of dynamical symmetries to sort systems in the absence of detailed causal models. other sampled curves (so j ∈ 2, . . . , p). Any given symme- try function can thus be decomposed into n functions of n variables: hx̃j,1 (x1,1 , . . . , x1,n ), . . . , x̃j,n (xj,1 , . . . , xj,n )i . 2.3 Learning dynamical kinds To estimate these components for each symmetry, the sam- pled data is rearranged as follows. Using time to match up Though it is not sufficient, sharing a set of dynamical sym- corresponding samples, the values of hx1,1 (t), . . . , x1,n (t)i metries is a necessary condition for two systems to belong to the same dynamical kind. Given that the collection of possi- are paired separately with each component of x~˜i (t). This ble dynamical symmetries is uncountably large as is the set is shown schematically for n = p = 2 in Figure 1. The of interventions one could perform on the system, verifying result is a set of n matrices for each of the p − 1 dynamical that any two systems have all of their dynamical symmetries symmetries implicitly measured in the experiments. in common is not possible. However, it is possible to verify In the final phase, the sets of transformed data representing with confidence that two systems do not share all of their specific dynamical symmetries are compared across sys- symmetries, and thus differ with respect to dynamical kind tems. The output is a judgment that one or more dynamical (and ultimately causal structure). The algorithm described symmetries differ between System 1 and System 2 (the pos- in the next section uses empirical data from systems that itive case), or that the set of symmetries is indistinguishable are presumed to be governed by a deterministic temporal (the negative case). To make this judgment, 10-fold cross- dynamics in order to assess whether or not the systems be- validation is used to estimate the error of two competing ing compared differ with respect to one or more dynamical models, sep and joint. For the sep model, data from System symmetry with respect to time. 1 and System 2 are treated separately. Each of the n com- ponents of the p − 1 symmetry transformations for System 3 The algorithm 1 are fit with polynomial surfaces independently of the n components of the p − 1 symmetries of System 2. For the joint model, the data from System 1 and System 2 for each Comparing dynamical systems proceeds in three phases: of the p − 1 symmetries is combined and treated as a single (i) sampling, (ii) transformation, and (iii) comparison. In sample from the same function. phase (i), the variables v1 , . . . , vn are measured at times t1 , . . . , tm for both System 1 and System 2, starting from For both models, polynomial surfaces are used to provide a p ≥ 2 distinct initial conditions (the set of initial condi- global fit of each component function, ~x˜j,i (~x1 (t)) . These tions is presumed to be identical for the two systems). The surfaces are fit using singular-value decomposition to find the least-squares parameter values for a polynomial in n variables and order d (Golub and Reinsch, 1970). The or- der of the surface is determined by 10-fold cross-validation. Specifically, the error for a fit of order 1 is estimated via 10-fold cross-validation. Higher orders continue to be inves- tigated until the relative reduction in estimated error fails to exceed a tunable threshold, , at which point the previous order considered is used to fit the entire dataset. For the results reported here,  was fixed at 10−4 , though as Figure Algorithm 1 Dynamical Type Comparison 4.5 (e)-(f) indicates, the performance of the algorithm is Input: Aσ,v – for each transformation, σ and target vari- largely insensitive to the value of epsilon over ten orders of able, vi , measured for System 1, contains an m×(n+1) magnitude. matrix with m samples (rows) of v1 , . . . , vn and the value of vi to which σ maps the system state The estimated error of sep and joint cannot be directly com- Bσ,v – for each transformation, σ and target variable, v, pared. If in fact the two systems are of the same dynamical measured for System 2, contains an m × (n + 1) matrix type and thus have all of their symmetries in common, then with m samples (rows) of v1 , . . . , vn and the value of v the two models are expected to have the same error. How- to which σ maps the system state ever, due to random noise in the data and random variation Output: a boolean value indicating whether the data belong in how the data is partitioned, cross-validation will provide to systems of different dynamical types slightly different estimates. To meaningfully compare the 1: for each σ and v, randomize the rows of Aσ,v , Bσ,v competing models, it must be determined how much of the 2: P artition1σ,v ← divide the rows of Aσ,v into 10 seg- difference in error is attributable to noise alone. In order ments to do so, the data from System 1 is divided in two, and 3: P artition2σ,v ← divide the rows of Bσ,v into 10 seg- the same 10-fold cross-validation routine is used to esti- ments mate the errors of joint and combined models. In this case, 4: for i = 1 to 10 do since all of the data comes from the same system, any dif- 5: T rain1σ,v ← S{P artition1σ,v [j]|j 6= i} S ference in estimated error is due to noise in the data. Call 6: T rain2σ,v ← {P artition2σ,v [j]|j 6= i} the absolute value of this difference, δ1 . The procedure 7: T rainJointσ,v ← T rain1σ,v ∪ T rain2σ,v is repeated with the data from System 2 to find δ2 . Then 8: M odelσ,v1 ← FitPolynomial(T rain1σ,v ) δ = max(δ1 , δ2 ) is used as a threshold to compare sep and 9: M odelσ,v ← FitPolynomial(T rain2σ,v ) 2 joint. Specifically, if the mean squared error of the joint 10: M odelσ,v1,2 ← FitPolynomial(T rain1σ,v ∪T rain2σ,v ) model as estimated by cross-validation exceeds the mean square error estimate for the sep model by more than δ, i.e. 11: SE 1 ← S {for all σ, v, the squared errors of M SEjoint > M SEsep + δ, then the hypotheses of shared 1 M odelσ,v 1 symmetries is rejected and the systems are judged to be of 2 S respect to P artitionσ,v [i]} with 12: SE ← {for all σ, v, the squared errors of different kinds. The overall algorithm for the comparison 2 M odelσ,v 2 phase is shown in Algorithm 1. 1,2 withSrespect to P artitionσ,v [i]} 13: SE ← {for all σ, v, the squared errors 1,2 of M odelσ,v with respect to P artition1σ,v [i] ∪ 4 Assessing performance P artition2σ,v [i]} 14: end for 4.1 General considerations 15: M SEsep ← mean(SE 1 ∪ SE 2 ) 16: M SEjoint ← mean(SE 1,2 ) The probability that the algorithm correctly detects a differ- 17: if M SEjoint > M SEsep + threshold(Aσ,v , Bσ,v ) ence in measured symmetries depends upon the details of then the dynamics underlying the two systems, the number of ini- 18: return TRUE tial conditions sampled (i.e., the number of transformations 19: else applied), the time over which the systems were observed, 20: return FALSE and the amount of measurement noise present. All of these 21: end if factors interact with one another, and there is no tractable way to represent, let alone compute, the performance of the algorithm in all conceivable circumstances. Instead, we undertook experiments with simulated data for a couple of nonlinear systems with two aims in mind: (i) to demonstrate that the algorithm does in fact correctly identify whether systems belong to the same dynamical kinds, and (ii) to show how the algorithm behaves as sampling error and the 120 (a) 120 (d) 180 (a) 180 (d) 100 100 160 160 80 80 140 140 120 120 population population population population 60 60 100 100 40 40 80 80 20 r = 1. 5, K = 100, α = 1, 20 r = 1. 5, K = 100, α = 1, 60 60 β = 1, γ = 1 β = 1, γ = 1 40 40 0 0 20 20 20 20 0 0 0 2 4 6 8 10 0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 time time time time 120 (b) 120 (e) 160 (b) 180 (e) 100 100 140 160 120 140 80 80 120 population population 100 population 60 population 60 100 80 80 40 40 60 r = 1. 5, K = 100, α = 1, r = 5, K = 100, α = 1, 60 20 20 40 40 0 β = 0. 6, γ = 1 0 β = 1, γ = 1 20 20 20 20 0 0 0 2 4 6 8 10 0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 time time time time (c) (f) (c) (f) 100 100 100 100 80 80 80 80 Error (%) Error (%) 60 60 Error (%) Error (%) 60 60 40 40 40 40 20 20 20 20 0 0 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 β r5 r2 /r1 r1 Figure 2: (a) A representative sample of data from the sim- Figure 3: (a) Representative sample of data from the sim- ulated logistic growth system with the indicated parameter ulated two-species competitive Lotka-Volterra model with values and Gaussian noise with a standard deviation of 3. noise drawn from a Gaussian distribution with standard de- The two curves correspond to initial populations of 1 (red viation of 5. The trajectories marked with pluses correspond pluses) and 25 (blue dots). (b) Data sampled from a growth to both competing species with initial populations of 1, and system belonging to a different dynamical kind. (c) The the trajectories marked with dots indicate the growth of both error rate of the algorithm as the value of β is varied in one species starting from initial populations of 25 each. (b) Data of the models compared, while fixed at β = 1 for the other. sampled from a system belonging to a different kind, (c) (d) and (e) Representative samples of growth systems of the The error rate of the model as the ratio of growth rates r2 /r1 same dynamical kind. (f) The error rate of the algorithm is varied in one of the systems compared, while the other as a function of the value of r in the underlying general- system is held fixed at r1 = r2 = 2. (d), (e) Representative ized logistic growth model, with one system held fixed at samples from a pair of systems of the same dynamical kind r = 1.5. but with distinct growth rates. (f) The error rate of the algo- rithm as the value of the growth rates r1 and r2 are varied in a fixed ratio r2 /r1 = 2 for one of the systems compared. The horizontal axis shows the value of r1 . The other system degree of similarity of the underlying dynamics are varied is held fixed at r1 = 2, r2 = 4. in isolation. 4.2.1 Biological growth The first class of systems we considered includes a subset 4.2 Synthetic data of a theoretically important family of nonlinear models of biological growth, the so-called “logistic growth models” (Tsoularis and Wallace, 2002). Models in this family are We tested the algorithm against two classes of simulated described by a differential equation of the form: system for which we could be certain of the symmetries of the underlying dynamics and thus of whether or not any γ ẋ = rxα 1 − (x/K)β (2) two systems are of the same dynamical type. The use of simulations also offered control of sampling noise, allowing The variable x refers to the size of a population, while the us to test the robustness of the algorithm against a range of parameters r and K are generally taken to correspond to noise distributions. We focused on systems of importance an intrinsic growth rate and carrying capacity, respectively. to theoretical ecology, a field in which system identification For each value of α, β and γ, Equation 2 yields a distinct is notoriously difficult. class of models of growth dynamics. There is significant controversy over which, if any, of these models describes governed by different members of the family of logistic actual biological populations, in part because it is so difficult growth models. Specifically, we examined the accuracy of to distinguish solutions of the different models from one the algorithm for systems with α = γ = 1 and different another with standard statistical approaches. We focused values of β. For each of 10 values of β 6= 1, we ran 100 on the set of models for which α = β = γ = 1. In trials comparing a system with this value against one in this case, Equation 2 reduces to the well-known Verhulst which β = 1. In all cases, the systems belonged to different logistic equation, ẋ = rx (1 − (x/K)). This equation has dynamical kinds. Of course, the point of the algorithm is not closed-form solutions which we used to simulate growing merely to recognize when dynamical systems are different, populations. Importantly, it is straightforward to solve for but to group systems that, despite appearances, belong to the the set of dynamical symmetries of population with respect same dynamical kind. So in a second set of experiments, we to time. For a transformation, σ to be a dynamical symmetry examined comparisons between two systems governed by of the Verhulst equation according to Definition 2, it must the Verhulst equation (α = β = γ = 1) that differed to vary- satisfy: ing degrees in their respect growth rates, r. One system was fixed at r = 1.5. We ran 100 trials with the second system σ̇(x(t)) = rσ(x(t)) (1 − σ(x(t))/K) (3) at each of 10 different values of r. According to Equation 4, all of these systems belong to the same dynamical kind. This results in a one-parameter family of symmetries: In all of these experiments, we imposed measurement noise that was normally distributed with a standard deviation of 5. σp (x) = Kx/ (1 − e−p )x + e−p K  (4) The results of these experiments are displayed in Figure 2. Note that for different values of K, a different family of We conducted an analogous battery of tests with simulated symmetries and thus a distinct dynamical kind is obtained. Lotka-Volterra systems. These results are shown in Figure However, two systems with equal K values are in the same 3. kind no matter how much r differs between them. The principal lesson to be drawn is that the algorithm works – almost every time it declares two systems to belong to 4.2.2 Species competition different kinds, they in fact do. Of course, as the error The second class of systems we simulated can be viewed rate curves demonstrate, the more similar the trajectories as a generalized logistic growth model that aims to cap- produced by the dynamics underlying two systems are – ture the interaction of two or more species competing for such as when β = 1 in one system and β ≈ 1 in the other resources. Specifically, we looked at the two-species com- – the more likely the algorithm is to miss the difference. petitive Lotka-Volterra model (Pastor, 2011): On the other hand, it’s very seldom fooled into declaring distinct kinds when two systems – no matter how divergent ẋ1 = r1 x1 (1 − (x1 + α12 x2 )/K1 ) their apparent behavior – really do belong to the same kind. (5) ẋ2 = r2 x2 (1 − (x2 + α21 x1 )/K2 ) 4.4 Noise and normality In this case, it’s much more complicated to find closed form expressions for symmetries, in part because the symme- We characterized the robustness of the comparison algo- tries are vector functions, σ (x) = hσ1 (x1 , x2 ), σ2 (x1 , x2 )i. rithm under increasing noise in the single-variable case by However, it is straightforward to show that the symmetry testing it against pairs of samples from two kinds of Verhulst condition implies that the symmetries of the system are func- logarithmic growth system: System 1 for which K = 100 tions only of the ratio of the growth rates, r2 /r1 . We thus and System 2 for which K = 90 (these are in different dy- tested whether the algorithm correctly classifies systems namical kinds since the family of symmetries is a function with the same growth rate ratio in the same dynamical kind of K). For each of the four possible pairings of System 1 despite widely different values of r1 and r2 . Conversely, with System 2, we conducted 25 experiments (for a total we examined the efficiency of the algorithm at discriminat- of 50 experiments in which the systems belonged to the ing systems with distinct growth rate ratios despite similar same dynamical kind, and 50 in which they belonged in values for all parameters. different kinds). In each experiment, Gaussian noise was added to measurements of the state of each simulated system. 4.3 Classification This was repeated for 10 different values of the standard deviation, σ, of the distribution of added noise. A positive To verify that the algorithm does in fact correctly deter- example is one in which the two systems actually differ, a mine whether systems belong to the same dynamical kind negative is when systems of the same type are paired to- and to explore its limitations in doing so, we performed gether. Only 8 out of the 100 trials returned a false positive; experiments with both logistic growth and Lotka-Volterra nearly all errors (by design) were false negatives. The ac- simulations. For growth models, we compared systems gov- curacy of the classifier is plotted in Figure 4 as a function erned by the Verhulst equation (α = β = γ = 1) to systems of the standard deviation of the noise. The algorithm is > 80% accurate until the standard deviation of the noise fit for each sampled symmetry. To examine the effect of reaches more than 12% of the overall range of the dynamical the value of  on the classifier error rate, we tested pairs variable measured. of samples from two kinds of Verhulst logarithmic growth system: System 1 for which K = 100 and System 2 for We conducted analogous experiments in the multi-variable which K = 90 (these are in different dynamical kinds). case by testing pairs of samples from two kinds of Lotka- For each of the four possible pairings of System 1 with Volterra system, specifically one for which r1 = 2, r2 = 4 System 2, we conducted 25 experiments (for a total of 50 and one for which r1 = r2 = 3. Again, we ran 100 trials experiments in which the systems belonged to the same of which 50 involved systems of the same dynamical kind dynamical kind, and 50 in which they belonged in different and 50 involved systems of different kinds. We saw com- kinds). These experiments were repeated for each of 10 parable or better performance with respect to the preceding different values of , spanning 10 orders of magnitude. As is experiment (see Figure 4 (c)). There is, however, a subtle clear from Figure 4.5 (e), there is no significant dependence property of the algorithm’s behavior in this case that bears of the overall error rate on the value chosen. We conducted mentioning. At very low values of noise, it performs no analogous experiments in the multi-variable case by testing better than at high values of noise. The reason for this is pairs of samples from two kinds of Lotka-Volterra system, that for two or more variables, there is no precise overlap specifically one for which r1 = 2, r2 = 4 and one for between the initial, untransformed trajectories of two sys- which r1 = r2 = 3. Again, we ran 100 trials of which tems that aren’t exactly alike in parameter values. So when 50 involved systems of the same dynamical kind and 50 the noise is low enough, the algorithm can always fit a joint involved systems of different kinds. Again, there was no model essentially by tying together two separate models of significant dependence of the overall error rate upon the functions with disjoint support. However, a little bit of noise value of , as indicated in Figure 4.5 (f). blurs the trajectories enough to produce overlap, and the statistics of the joint versus separate fits become dominated by the shapes of the curves, rather than their exact location 100 (a) 100 (b) in phase space. Fortunately, the conservative nature of the 80 80 Error (%) algorithm allows this deficiency to be corrected by injecting 60 60 noise into a sample and watching to see if the algorithm 40 40 changes its mind from “same kind” to “different”. 20 20 0 0 Finally, to get a sense for the fragility of the algorithm 0 5 10 15 20 25 30 0 5 10 15 20 25 30 σ α under violations of normality in the sample noise, we re- peated the same experiments, except that a Gaussian noise 100 (c) 100 (d) source was replaced with the skew normal distribution (Az- 80 80 √ (x−µ)2 Error (%) zalini, 1985). If φ(x) = 1/ 2ω 2 πe− 2ω2 is the stan- 60 60 dard normal probability density function with mean µ and 40 40 20 20 Rx standard deviation ω, Φ(x) = −∞ φ(x0 )dx0 is it’s cor- responding cumulative distribution function, and T (h, α) 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 σ α is Owen’s T -function (Owen, 1956), then the cumula- tive distribution function of the skew normal is given by 100 (e) 100 (f) Ψ(x) = Φ( (x−µ) (x−µ) ω ) − 2T ( ω , α). We used the method 80 80 Error (%) of transformation to sample from this skew normal distri- 60 60 bution. When the shape parameter α = 0, the skew normal 40 40 distribution reduces to an ordinary Gaussian with mean µ 20 20 and standard deviation, ω. We tested the algorithm in exper- 0 0 10 8 6 4 2 0 10 8 6 4 2 0 iments in which µ = 0 and α was varied from 0 to 30 while log(²) log(²) keeping the standard deviation of the distribution fixed at 15 by varying ω appropriately. The noise was thus increasingly Figure 4: (a) Error rate as a function of standard deviation right-skewed. Over this range, however, the skew had no of normally distributed noise for logistic growth models. impact on the accuracy of the classifier, as can be seen in (b) Error rate as a function of the α-parameter of the skew Figure 4 (b) and (d). normal distribution for logistic growth systems. (c) Error rate versus standard deviation of normally distributed noise 4.5 Dependence on tunable parameters for two-species Lotka-Volterra systems. (d) Error rate versus α for Lotka-Volterra systems. (e) Error rate as a function As indicated in Section 3 we also examined the sensitivity of log() for logistic growth systems. (f) Error rate as a of the algorithm to the tunable parameter, , used in a cross- function of log() for two Lotka-Volterra systems. validation routine to choose the order of the polynomial 5 Discussion function of y0 , then Equation 6 is an instance of the sort of (possibly nonlinear) model considered by Hoyer et al. The experiments reported here demonstrate that it is possi- (2009). ble to recognize dynamical kinds without any knowledge According to this model, we can express the joint distribu- of the underlying dynamics, and to do so in the face of tion over x and y as: significant noise. By directly assessing whether systems share a common set of dynamical symmetries, it is possible to pool systems for model identification or to validate dy- p(x, y) = px (x)pη (y|x) = px (x)pη (y − f (x; y0 )) (7) namical models without explicitly fitting models to the data. Perhaps most importantly, we have demonstrated that an important sort of scientific inference widely neglected in the Let’s consider x as the index variable and y the target. Then literature on automated discovery is amenable to algorithmic according to the extended definition, a transformation σ(y) treatment. is a dynamical symmetry just if the joint distribution, p(x, y) is the same whether we (i) first apply σ when the expected But the work reported here only begins to exploit the frame- value E[x] = x0 and then intervene to bring the expected work of dynamical kinds. The algorithm described above value of x to E[x] = x0 + δ (changing the probability was designed with a particular class of system in mind, density over x to p̃x (x)); or (ii) bring the expected value of namely those for which the deterministic causal relations x to E[x] = x0 + δ and then apply σ to y. Given the model are well-described by a differential equation in time. In other expressed by Equation 6, this condition is satisfied only if: words, it was assumed that systems of unknown dynamical type are governed by a deterministic, time-dependent dy- namics, and that all random variation observed is noise in the p̃x (x)pη (y − f (x; σ(y0 ))) = measurement process. But the notion of a dynamical kind (8) p̃x (x)pη (y − σ(f (x; y0 ))) as reflected in Definition 2 does not limit one to considering temporal dynamics. And with only modest modification, If we assume that p̃x (x) > 0, this entails that one can accommodate the general case of stochastic dy- namics, that is, causal relations that are inherently noisy, f (x; σ(y0 )) = σ(f (x; y0 )) (9) regardless of the measurement process. The requisite modi- fication is reflected by the following definition: This is exactly the same condition that would have obtained Definition 5 (Dynamical symmetry). Let V be a set of without the additive noise in the model. In general, determin- variables. Let σ be an intervention on the variables in istic systems governed by first-order differential equations Int ⊂ V . The transformation σ is a dynamical symmetry can be cast in the form of Equation 6 without the additive with respect to some index variable X ∈ V − Int if and noise (and it’s not difficult to generalize that form for higher- only if σ has the following property: for all xi and xf , the order dynamics). By adding the noise term, we can consider final probability distribution over V is the same whether σ stochastic versions of the dynamical systems considered is applied when E[X] = xi and then an intervention on X above. For example, we can introduce a stochastic logistic makes it such that E[X] = xf , or the intervention on X is growth model this way: applied first, changing its expected value from xi to xf , and then σ is applied. (K − x0 )y0 x y := +η (10) (K − y0 )x0 + (y0 − x0 )x To see how this extended definition can be applied in the GCM framework, consider the class of two-variable models The variable y represents the population at t + δ which is in which x → y, x is distributed according to px (x), and dependent upon x, the population at t. The variable y0 is the the value of y is determined as a function f of the value of population at t + δ that results from a population of x0 at t. x and an additive noise variable: Though it is perhaps not obvious from the form, note that Equation 10 reduces to the deterministic model when the noise term is set to 0. The symmetry condition expressed y := f (x; y0 ) + η (6) by Equation 9 entails in this case that   The noise variable, η is distributed according to pη (η) with (K − x0 )y0 x σ = a mean value of 0. The function f may be any function (K − y0 )x0 + (y0 − x0 )x from values of x to values of y (not necessarily linear). The (K − x0 )σ(y0 )x parameter y0 is the expected value of y when the expected (K − σ(y0 ))x0 + (σ(y0 ) − x0 )x value of x is x0 . It’s a way of capturing the ‘memory’ and dependence on initial conditions reflected in a differential which is in turn satisfied by equation without unfolding such dependence as an infinite Ky chain of discrete time slices. Note that when f is a constant σp (y) = (1 − e−p )y + e−p K This is exactly the same set of symmetries that characterize Mattingly, J. and W. Warwick (2009, August). Projectible the deterministic dynamical kind (see Equation 4). In other Predicates in Analogue and Simulated Systems. Syn- words, the extended definition of dynamical symmetry picks these 169(3), 465–482. ArticleType: research-article out an expanded class of systems that includes as a proper / Full publication date: Aug., 2009 / Copyright 2009 subset the systems governed by a deterministic Verhulst Springer. equation with a particular value of the carrying capacity, K. Noether, E. and M. A. Tavel (1971). Invariant Variation But it also includes systems with stochastic causal relations Problems. Transport Theory and Statistical Physics 1(3), as described by Equation 10. 186–207. Despite this overlap in conditions for kind membership, Norton, J. P. (1986). An introduction to identification. Lon- detecting sameness or difference of dynamical kind in sys- don ; New York: Academic Press. OCLC: 12237750. tems exhibiting stochastic causation requires an approach Owen, D. B. (1956). Tables for Computing Bivariate Nor- different from the algorithm described above. That’s be- mal Probabilities. The Annals of Mathematical Statis- cause individual trajectories are random walks biased by the tics 27(4), 1075–1090. function f rather than noisy instantiations of deterministic trajectories. We leave the development of methods appropri- Pastor, J. (2011). Mathematical ecology of populations and ate for stochastic causation to future work. But we would ecosystems. John Wiley & Sons. like to reiterate the generality of the theory that has allowed Pearl, J. (2000). Causality : models, reasoning, and infer- us to frame the problem of kind discovery in this extended ence. Cambridge, U.K.; New York: Cambridge Univer- context. We have already demonstrated an algorithm for the sity Press. causal systems of the sort that are the focus of most work Rosen, J. (1995). Symmetry in science : an introduction to in system identification. The point we wish to stress is that the general theory. New York: Springer-Verlag. the theoretical framework of dynamical kinds extends to the relations of stochastic causation at the focus of work in Rosen, J. (2008). Symmetry rules how science and nature causal discovery. Exploiting the framework in this context are founded on symmetry. Berlin: Springer. in order to infer scientifically salient kinds, validate mod- Schmidt, M. and H. Lipson (2009, April). Distilling els, and pool data for model construction will require only Free-Form Natural Laws from Experimental Data. Sci- modest algorithmic innovation. ence 324(5923), 81–85. Spirtes, P., C. N. Glymour, and R. Scheines (2000). Cau- Acknowledgements sation, prediction, and search (2nd ed ed.). Adaptive This material is based upon work supported by the National computation and machine learning. Cambridge, Mass: Science Foundation under Grant No. 1454190. MIT Press. Spirtes, P. and K. Zhang (2016, February). Causal discov- References ery and inference: concepts and recent methodological advances. Applied Informatics 3(1), 1–28. Azzalini, A. (1985). A Class of Distributions Which In- Tsoularis, A. and J. Wallace (2002, July). Analysis of lo- cludes the Normal Ones. Scandinavian Journal of Statis- gistic growth models. Mathematical Biosciences 179(1), tics 12(2), 171–178. 21–55. Golub, P. G. H. and D. C. Reinsch (1970, April). Sin- Turchin, P. (2001, July). Does population ecology have gular value decomposition and least squares solutions. general laws? Oikos 94(1), 17–26. Numerische Mathematik 14(5), 403–420. Hoyer, P. O., D. Janzing, J. M. Mooij, J. Peters, and B. Schlkopf (2009). Nonlinear causal discovery with additive noise models. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou (Eds.), Advances in Neural In- formation Processing Systems 21, pp. 689–696. Curran Associates, Inc. Jantzen, B. C. (2014, December). Projection, symmetry, and natural kinds. Synthese 192(11), 3617–3646. Ljung, L. (2010, April). Perspectives on system identifica- tion. Annual Reviews in Control 34(1), 1–12. Logemann, H. and E. P. Ryan (2014). Ordinary Differential Equations: Analysis, Qualitative Theory and Control. London: Springer-Verlag.