<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>We demonstrate the possibility of classifying
causal systems into kinds that share a common
structure without first constructing an explicit
dynamical model or using prior knowledge of the
system dynamics. The algorithmic ability to
determine whether arbitrary systems are governed
by causal relations of the same form offers
significant practical applications in the development
and validation of dynamical models. It is also
of theoretical interest as an essential stage in the
scientific inference of laws from empirical data.
The algorithm presented is based on the
dynamical symmetry approach to dynamical kinds. A
dynamical symmetry with respect to time is an
intervention on one or more variables of a system
that commutes with the time evolution of the
system. A dynamical kind is a class of systems
sharing a set of dynamical symmetries. The algorithm
presented classifies deterministic, time-dependent
causal systems by directly comparing their
exhibited symmetries. Using simulated, noisy data
from a variety of nonlinear systems, we show
that this algorithm correctly sorts systems into
dynamical kinds. It is robust under significant
sampling error, is immune to violations of
normality in sampling error, and fails gracefully with
increasing dynamical similarity. The algorithm
we demonstrate is the first to address this aspect
of automated scientific discovery.</p>
    </sec>
    <sec id="sec-2">
      <title>The problem</title>
      <p>
        Recent decades have seen remarkable progress in addressing
the problem of learning causal structure from observational
or experimental data. One especially rich approach deploys
the framework of graphical causal models (GCMs)
        <xref ref-type="bibr" rid="ref12 ref16 ref17">(Pearl,
2000; Spirtes et al., 2000; Spirtes and Zhang, 2016)</xref>
        . Given
a particular set of variables describing a specific system or
population, the learning problem is, roughly, to ascertain the
set of direct, possibly stochastic causal relations amongst
variables and, often, a model of the explicit functional form
of those causal dependencies. Similarly, there is vast
literature on system identification that focuses on producing
a model of a given complex dynamical system that is
suitable for prediction and control
        <xref ref-type="bibr" rid="ref5 ref9">(Norton, 1986; Ljung, 2010)</xref>
        .
While these lines of research differ in terminology, methods,
and emphasis – system identification tends to focus on
deterministic systems, while the GCM formalism is designed for
representing stochastic causal relations – they both provide
methods for discovering causal structure. They also share
an emphasis on the individual system – particular systems
of causal relations are grouped together into categories only
after the fact, and generally without a consistent aim or
motivation.
      </p>
      <p>This exclusive focus on individual systems, however, leaves
out a significant element of scientific inference. Much
scientific work, especially in the early exploration of a new
phenomena, involves grouping distinct physical systems into
categories. These categories are then targets for law-like
generalization. In other words, a primary goal of scientific
research is the identification of laws of kinds of system, not
just models of one-off systems. So, for example, we seek
laws of chemical kinetics, not just models of the
particular reaction in some specific beaker on a lab bench. We
seek laws of motion, not just a model of how this particular
pendulum moves given a variable arm length. This sort
of inference to kinds is not reflected in the causal learning
or system identification algorithms that have been
developed – both approaches to learning causal structure allow
for classification only after model creation.</p>
      <p>
        But what if such classification could be done prior to
learning detailed models of the causal structure of individual
systems? In that case, a number of inferential tasks would
become substantially easier. For example, if one knew in
advance that multiple systems are governed by a common
type of causal structure – whether representable by, say, a
parametric class of differential equations or an equivalence
class of GCMs – then one could pool data from multiple
systems in order to determine the form of the differential
equation or an equivalence class of GCMs governing
systems of that sort. Or consider the problem of model
validation. For sufficiently complex computer models, validation
can be exceedingly difficult because the dynamical relations
amongst high-level variables in the simulation are unknown
        <xref ref-type="bibr" rid="ref15 ref7">(Mattingly and Warwick, 2009)</xref>
        . But if it were possible to
ascertain directly – without an explicit model of the
highlevel dynamics in either the simulation or the target system
– that the dynamics are of the same type, this would provide
a tractable route to validation. Finally, one could assess
whether or not a system with unknown dynamics exhibits
transitions between different dynamical regimes, such as the
transition from laminar to turbulent flow in fluid dynamics.
In such transitions, one and the same system exhibits a
different dynamical form or causal structure at different times
or in different circumstances.
      </p>
      <p>
        For these reasons, a method for assessing the sameness
or difference of dynamical structure in the absence of any
explicit dynamical model would be both practically and
theoretically valuable. For such a direct classification to be
possible, two things are required. First, a clear and rigorous
definition of dynamical kind, and second, an empirical test
for sameness of kind. We suggest that the first component –
a rigorous definition – has been provided and well-motivated
by the theoretical work of
        <xref ref-type="bibr" rid="ref4">Jantzen (2014)</xref>
        . To meet the
second need, we present here both a test derived from this
notion of dynamical kind and an explicit algorithm for
automating that test that is accurate and robust in the presence
of significant measurement error in empirical data.
1.1
      </p>
      <p>
        Directly discerning dynamical difference: an
example
To get a sense for how such a test might work, consider two
different species of bacteria, A and B, in dilute solutions of
nutrient-rich media. One might be interested in knowing
whether the growth of both species is governed by the same
dynamics under these conditions, as suggested by the “First
Law of Ecology”
        <xref ref-type="bibr" rid="ref19">(Turchin, 2001)</xref>
        . Specifically, one might
be interested in whether both systems are well-represented
by differential equations of the same form, whatever that
form may be. The question can be approached directly by
considering the sorts of properties shared by two systems
governed by equations of the same form. One such property
is the behavior of the system under intervention. More
specifically, for any given dynamics there is a special class
of transformations of the system state that commute with
the evolution of that system state through time.
      </p>
      <p>To be concrete, suppose species A grows exponentially,
such that dxA(t)=dt = rAxA(t). In that case, all and only
scaling transformations of the form k(xA(t)) = kxA(t)
commute with the time-evolution of the system so that it
doesn’t matter whether one applies k and then evolves the
system or evolves the system and then applies k. Note that
this is equivalent to the condition that d k(xA(t))=dt =
rA k(xA(t)). Even if these transformations were unknown
to us, there is a simple way to ascertain whether the k for
species A are different from the corresponding
transformations, , that commute with the time evolution of species
B. To do so, we can read these transformations – at least
over a finite interval – directly off of our experimental data.
Suppose we make simultaneous measurements of each
population at m times, t0; t1; : : : ; tm. Suppose further that, for
each species, two experiments are conducted. In the first,
an intervention sets the initial populations to xA(t0) and
xB(t0) for species A and B, respectively, and in the
second we set the initial populations to x~A(t0) and x~B(t0).
For each population, the resulting pair of trajectories must
be connected by one of the special time-commuting
transformations characteristic of that system. Empirical curves
for and can thus be constructed by plotting x~A(ti) as
a function of xA(ti) and x~B(ti) as a function of xB(ti).
To decide whether the dynamics of the two systems have
the same form it is only necessary to determine whether
(xA) x~A(xA) = x~B(xB) (xB). If species A really
grows exponentially, then x~A(xA) = kxA, and detecting a
difference in dynamics depends only on detecting deviation
from a straight line in the sample of x~B(xB).
2
2.1</p>
      <p>Theoretical background and related work</p>
      <sec id="sec-2-1">
        <title>Theory of dynamical kinds</title>
        <p>The preceding example appealed to a feature of the
dynamics of all exponential growth systems. That is, dynamical
systems were grouped into categories on the basis of
highlevel features of their dynamics – in this case, the form of
the governing differential equation – rather than features of
specific trajectories or solutions of that equation. This is
the essence of a dynamical kind – a class of causal systems
that share some salient higher-order features though they
may differ dramatically in the specific relations amongst
variables.</p>
        <p>
          Of course, there are indefinitely many higher-order features
of causal structure that could be used for classification in
this way. We have chosen to focus on the variety of
higherorder properties that were used to sort dynamical kinds in
the bacterial growth example. Specifically, the notion of
dynamical kind we propose to exploit for purposes of
automated discovery is that proposed by
          <xref ref-type="bibr" rid="ref4">Jantzen (2014)</xref>
          . The
motivation is two-fold. First, Jantzen’s proposal picks out
classes of causal system that, if identified in the absence
of detailed individual models, would provide the sort of
practical benefits for data-pooling, validation, and
regimedetection described above. Second, as Jantzen argues in
detail, the classes carved out in this way coincide well with
existing scientific practice. Thus, automating the
discovery of dynamical kinds in Jantzen’s sense would go a long
way toward filling a gap in automated scientific inference,
namely the absence of algorithmic methods for determining
classes of system appropriate for framing laws.
        </p>
        <p>
          In Jantzen’s view, what makes the dynamics of two distinct
systems members of the same kind is a higher-order
property that they share: the structured set of transformations of
each variable that commute with incrementation of another
variable. These transformations are called ‘dynamical
symmetries’ and are defined as follows
          <xref ref-type="bibr" rid="ref4 ref6">(Jantzen, 2014, p3633)</xref>
          :
Definition 1 (Dynamical symmetry). Let V be a set of
variables. Let be an intervention on the variables in
Int V . The transformation is a dynamical symmetry
with respect to some index variable X 2 V Int if and only
if has the following property: for all xi and xf , the final
state of the system is the same whether is applied when
X = xi and then an intervention on X makes it such that
X = xf , or the intervention on X is applied first, changing
its value from xi to xf , and then is applied.
        </p>
        <p>
          Notice that this definition is substantive rather than
mathematical in that it appeals to physical interactions of a certain
sort, namely interventions in the GCM sense. As such, this
definition of dynamical symmetry is not tied to any
particular formalism or application. The notion of commuting
interventions is certainly at home in the GCM framework,
but it is also easily translated into the language of
differential equations as the commutation of system evolution and
the manipulation of initial or boundary conditions. It is the
latter framework we focus on in the remainder of this paper.
In particular, we will be concerned with classes of system
that are well-described by differential equations. In fact,
we’ll further restrict attention to systems with explicit time
dependence. In this special case of temporal dynamics, time
is a uniquely natural “index variable” – in a sense, one can’t
help but “intervene” to increment time in any given
experiment. Thus, we are primarily interested in interventions on
subsets of variables which are dynamical symmetries with
respect to time
          <xref ref-type="bibr" rid="ref4 ref6">(Jantzen, 2014, p3632)</xref>
          :
Definition 2 (Dynamical symmetry with respect to time).
Let t be the variable representing time, and let V be a set
of additional dynamical variables such that t 2= V . Let
be an intervention on the variables in Int V . The
transformation is a dynamical symmetry with respect to
time if and only if for all intervals t, the final state of the
system is the same whether is applied at some time t0 and
the system evolved until t0 + t, or the system first allowed
to evolve from t0 to t0 + t and then is applied.
To put this in terms familiar from mathematical physics
          <xref ref-type="bibr" rid="ref13 ref14">(Rosen, 1995, 2008)</xref>
          , let t;s(x) be the propagator for a
given system, i.e., the function t;s : X ! X for which
t;s(x) is the state of the system at time t given that x 2 X
is the state at time s. A function : X ! X represents a
dynamical symmetry with respect to time if and only if for
every s; t &gt; s; and x, t;s( (x)) = ( t;s(x)). Put this
way, it’s clear that most symmetries of interest to physicists
are dynamical symmetries with respect to time. Assuming
that the dynamics is deterministic and well-described by a
system of ordinary differential equations in time, we can
also restate the condition in terms of flows
          <xref ref-type="bibr" rid="ref6">(Logemann and
Ryan, 2014)</xref>
          . These are mappings : X R ! X where
X is the space of sates of the system at times represented
by the reals and under the conditions that (x; 0) = 0 and
( (x; t); s) = (x; t + s). Dynamical symmetries in the
framework correspond to functions such that, for all x
and t:
( (x); t) = ( (x; t))
(1)
Whatever formalism is appropriate for expressing dynamical
symmetries in a particular application, it is not necessarily
true that all functions satisfying a given formal condition
are proper dynamical symmetries. The latter are defined
in terms of interventions on variables – interactions with
a system that change its measured quantities in a way that
is at least physically possible and which does not destroy
the causal connections amongst them. Some formal
transformations may represent unrealizable interventions such
as, perhaps, time reversal. For deterministic dynamics
represented by a system of ODEs in time, every dynamical
symmetry can be represented by a mapping satisfying the
condition expressed by Equation 1. But if we take Jantzen’s
definition seriously, then it may not be the case that every
such represents a dynamical symmetry. However, since
it is impossible to give a general account of what these
exceptions are, we will make the idealizing assumption that,
for the varieties of system under analysis here, there is a
one-to-one correspondence between dynamical symmetries
and functions satisfying Equation 1.
        </p>
        <p>
          In Jantzen’s sense, a bare set of dynamical symmetries is
insufficient to characterize a dynamical kind. Further
distinctions can be made if we note that for any given system
of causally connected variables, the corresponding set of
dynamical symmetries exhibits structure under composition.
In other words, one dynamical symmetry applied after
another is also a dynamical symmetry, and the equivalence
relations amongst such compositions constitute a symmetry
structure
          <xref ref-type="bibr" rid="ref4 ref6">(Jantzen, 2014, p3634-3635)</xref>
          :
Definition 3 (Nontrivial symmetry structure). The
symmetry structure of a collection of dynamical symmetries,
= f iji = 1; 2; : : : g is given by the composition function
: ! . A nontrivial symmetry structure is one that
contains the identity (i.e., “do nothing”) transformation and,
for at least one variable Y relative to some index variable
X, is not isomorphic to the group of mappings from Y to
itself (i.e., the set of all transformations of Y ).
        </p>
        <p>
          Jantzen identifies these nontrivial symmetry structures with
the dynamical kinds of interest
          <xref ref-type="bibr" rid="ref6">(2014, p3635)</xref>
          :
Definition 4 (Dynamical kind). A dynamical kind is a class
of systems of variables that share a set of dynamical
symmetries that are related by a non-trivial symmetry structure.
As suggested above, Jantzen’s theory of dynamical kinds is
closely connected to the axiomatic treatment of causation
that underwrites the increasingly diverse and powerful set
of causal discovery algorithms. In particular, Jantzen shows
that a system of variables possesses a nontrivial symmetry
structure S (one that is not isomorphic to the set of
automorphisms on the states of S) if and only if it has a non-trivial
causal structure (one for which at least one variable is a
direct cause of another)
          <xref ref-type="bibr" rid="ref6">(2014, p3635-3636)</xref>
          . This is the sense
in which dynamical symmetries are high-level properties
of the causal structure of a system. There is also precedent
for Jantzen’s appeal to symmetries in the machine learning
literature.
          <xref ref-type="bibr" rid="ref15">Schmidt and Lipson (2009)</xref>
          developed an
automated discovery algorithm that learns free-form laws by
seeking invariants of the motion – functions of the
dynamical variables whose values remain constant through time.
As Noether
          <xref ref-type="bibr" rid="ref8">(Noether and Tavel, 1971)</xref>
          demonstrated, the
presence of symmetries is intimately connected with the
presence of such invariants. But to the best of our
knowledge, there have been no prior attempts to use empirical
measurements of dynamical symmetries to sort systems in
the absence of detailed causal models.
2.3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Learning dynamical kinds</title>
        <p>Though it is not sufficient, sharing a set of dynamical
symmetries is a necessary condition for two systems to belong to
the same dynamical kind. Given that the collection of
possible dynamical symmetries is uncountably large as is the set
of interventions one could perform on the system, verifying
that any two systems have all of their dynamical symmetries
in common is not possible. However, it is possible to verify
with confidence that two systems do not share all of their
symmetries, and thus differ with respect to dynamical kind
(and ultimately causal structure). The algorithm described
in the next section uses empirical data from systems that
are presumed to be governed by a deterministic temporal
dynamics in order to assess whether or not the systems
being compared differ with respect to one or more dynamical
symmetry with respect to time.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>The algorithm</title>
      <p>Comparing dynamical systems proceeds in three phases:
(i) sampling, (ii) transformation, and (iii) comparison. In
phase (i), the variables v1; : : : ; vn are measured at times
t1; : : : ; tm for both System 1 and System 2, starting from
p 2 distinct initial conditions (the set of initial
conditions is presumed to be identical for the two systems). The
*</p>
      <p>v0
2 a00
6 a10
66 a20
4 ..</p>
      <p>.
a00
a10
a20
.
.</p>
      <p>.
v1</p>
      <p>w

v~0
b00
b10
b20
.
.</p>
      <p>.
v1
b11 7
b21 77
. 5
.
.</p>
      <p>v~1
result, for each system, is a sample of p vector functions,
each with n components. In the second phase, these
samples are transformed in order to serve as samples of the
symmetry transformations to be compared. These
dynamical symmetries correspond to the p 1 vector functions,
~x~j (x~1(t)) which map the initial curve ~x1(t) into each of the
other sampled curves (so j 2 2; : : : ; p). Any given
symmetry function can thus be decomposed into n functions of n
variables: hx~j;1(x1;1; : : : ; x1;n); : : : ; x~j;n(xj;1; : : : ; xj;n)i .
To estimate these components for each symmetry, the
sampled data is rearranged as follows. Using time to match up
corresponding samples, the values of hx1;1(t); : : : ; x1;n(t)i
are paired separately with each component of x~~i(t). This
is shown schematically for n = p = 2 in Figure 1. The
result is a set of n matrices for each of the p 1 dynamical
symmetries implicitly measured in the experiments.
In the final phase, the sets of transformed data representing
specific dynamical symmetries are compared across
systems. The output is a judgment that one or more dynamical
symmetries differ between System 1 and System 2 (the
positive case), or that the set of symmetries is indistinguishable
(the negative case). To make this judgment, 10-fold
crossvalidation is used to estimate the error of two competing
models, sep and joint. For the sep model, data from System
1 and System 2 are treated separately. Each of the n
components of the p 1 symmetry transformations for System
1 are fit with polynomial surfaces independently of the n
components of the p 1 symmetries of System 2. For the
joint model, the data from System 1 and System 2 for each
of the p 1 symmetries is combined and treated as a single
sample from the same function.</p>
      <p>
        For both models, polynomial surfaces are used to provide a
global fit of each component function, ~x~j;i(~x1(t)) . These
surfaces are fit using singular-value decomposition to find
the least-squares parameter values for a polynomial in n
variables and order d
        <xref ref-type="bibr" rid="ref2">(Golub and Reinsch, 1970)</xref>
        . The
order of the surface is determined by 10-fold cross-validation.
Specifically, the error for a fit of order 1 is estimated via
10-fold cross-validation. Higher orders continue to be
investigated until the relative reduction in estimated error fails to
exceed a tunable threshold, , at which point the previous
order considered is used to fit the entire dataset. For the
results reported here, was fixed at 10 4, though as Figure
4.5 (e)-(f) indicates, the performance of the algorithm is
largely insensitive to the value of epsilon over ten orders of
magnitude.
      </p>
      <p>The estimated error of sep and joint cannot be directly
compared. If in fact the two systems are of the same dynamical
type and thus have all of their symmetries in common, then
the two models are expected to have the same error.
However, due to random noise in the data and random variation
in how the data is partitioned, cross-validation will provide
slightly different estimates. To meaningfully compare the
competing models, it must be determined how much of the
difference in error is attributable to noise alone. In order
to do so, the data from System 1 is divided in two, and
the same 10-fold cross-validation routine is used to
estimate the errors of joint and combined models. In this case,
since all of the data comes from the same system, any
difference in estimated error is due to noise in the data. Call
the absolute value of this difference, 1. The procedure
is repeated with the data from System 2 to find 2. Then
= max( 1; 2) is used as a threshold to compare sep and
joint. Specifically, if the mean squared error of the joint
model as estimated by cross-validation exceeds the mean
square error estimate for the sep model by more than , i.e.
M SEjoint &gt; M SEsep + , then the hypotheses of shared
symmetries is rejected and the systems are judged to be of
different kinds. The overall algorithm for the comparison
phase is shown in Algorithm 1.
4
4.1</p>
    </sec>
    <sec id="sec-4">
      <title>Assessing performance</title>
      <sec id="sec-4-1">
        <title>General considerations</title>
        <p>The probability that the algorithm correctly detects a
difference in measured symmetries depends upon the details of
the dynamics underlying the two systems, the number of
initial conditions sampled (i.e., the number of transformations
applied), the time over which the systems were observed,
and the amount of measurement noise present. All of these
factors interact with one another, and there is no tractable
way to represent, let alone compute, the performance of
the algorithm in all conceivable circumstances. Instead, we
undertook experiments with simulated data for a couple of
nonlinear systems with two aims in mind: (i) to demonstrate
that the algorithm does in fact correctly identify whether
systems belong to the same dynamical kinds, and (ii) to
show how the algorithm behaves as sampling error and the
Algorithm 1 Dynamical Type Comparison
Input: A ;v – for each transformation, and target
variable, vi, measured for System 1, contains an m (n+1)
matrix with m samples (rows) of v1; : : : ; vn and the
value of vi to which maps the system state
B ;v – for each transformation, and target variable, v,
measured for System 2, contains an m (n + 1) matrix
with m samples (rows) of v1; : : : ; vn and the value of v
to which maps the system state
Output: a boolean value indicating whether the data belong
to systems of different dynamical types
1: for each and v, randomize the rows of A ;v; B ;v
2: P artition1;v divide the rows of A ;v into 10
segments
3: P artition2;v divide the rows of B ;v into 10
segments
4: for i = 1 to 10 do
5: T rain1;v SfP artition1;v[j]jj 6= ig
6: T rain2;v SfP artition2;v[j]jj 6= ig
7: T rainJ oint ;v T rain1;v [ T rain2;v
8: M odel1;v FitPolynomial(T rain1;v)
9: M odel2;v FitPolynomial(T rain2;v)
10: M odel1;;2v FitPolynomial(T rain1;v[T rain2;v)
11: SE1 Sffor all ; v, the squared errors of</p>
        <p>M odel1;v with respect to P artition1;v[i]g
12: SE2 Sffor all ; v, the squared errors of</p>
        <p>M odel2;v with respect to P artition2;v[i]g
13: SE1;2 Sffor all ; v, the squared errors
of M odel1;;2v with respect to P artition1;v[i] [
P artition2;v[i]g
14: end for
15: M SEsep mean(SE1 [ SE2)
16: M SEjoint mean(SE1;2)
17: if M SEjoint &gt; M SEsep + threshold(A ;v; B ;v)
then
18: return TRUE
19: else
20: return FALSE
21: end if
120
100
n80
o
it60
a
lu40
p
op20
0
200
120
100
n80
o
it60
a
lu40
p
op20
0
200
100
)80
(%60
r
o
rr40
E
20
(a)
(b)
(c)
120
100
n80
o
it60
a
lu40
p
op20
0
10 200
100
)80
(%60
r
o
rr40
E
20
00
degree of similarity of the underlying dynamics are varied
in isolation.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Synthetic data</title>
        <p>
          We tested the algorithm against two classes of simulated
system for which we could be certain of the symmetries
of the underlying dynamics and thus of whether or not any
two systems are of the same dynamical type. The use of
simulations also offered control of sampling noise, allowing
us to test the robustness of the algorithm against a range of
noise distributions. We focused on systems of importance
to theoretical ecology, a field in which system identification
is notoriously difficult.
0 1.0 1.5 2.0 2.5 3.0 3.5 4r.0
1
The first class of systems we considered includes a subset
of a theoretically important family of nonlinear models of
biological growth, the so-called “logistic growth models”
          <xref ref-type="bibr" rid="ref18">(Tsoularis and Wallace, 2002)</xref>
          . Models in this family are
described by a differential equation of the form:
x_ = rx
1
(x=K)
(2)
The variable x refers to the size of a population, while the
parameters r and K are generally taken to correspond to
an intrinsic growth rate and carrying capacity, respectively.
For each value of ; and , Equation 2 yields a distinct
class of models of growth dynamics. There is significant
controversy over which, if any, of these models describes
actual biological populations, in part because it is so difficult
to distinguish solutions of the different models from one
another with standard statistical approaches. We focused
on the set of models for which = = = 1. In
this case, Equation 2 reduces to the well-known Verhulst
logistic equation, x_ = rx (1 (x=K)). This equation has
closed-form solutions which we used to simulate growing
populations. Importantly, it is straightforward to solve for
the set of dynamical symmetries of population with respect
to time. For a transformation, to be a dynamical symmetry
of the Verhulst equation according to Definition 2, it must
satisfy:
_ (x(t)) = r (x(t)) (1
(x(t))=K)
This results in a one-parameter family of symmetries:
p(x) = Kx= (1
e p)x + e pK
Note that for different values of K, a different family of
symmetries and thus a distinct dynamical kind is obtained.
However, two systems with equal K values are in the same
kind no matter how much r differs between them.
4.2.2
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Species competition</title>
        <p>
          The second class of systems we simulated can be viewed
as a generalized logistic growth model that aims to
capture the interaction of two or more species competing for
resources. Specifically, we looked at the two-species
competitive Lotka-Volterra model
          <xref ref-type="bibr" rid="ref11">(Pastor, 2011)</xref>
          :
x_ 1 = r1x1 (1
x_ 2 = r2x2 (1
(x1 +
(x2 +
12x2)=K1)
21x1)=K2)
In this case, it’s much more complicated to find closed form
expressions for symmetries, in part because the
symmetries are vector functions, (x) = h 1(x1; x2); 2(x1; x2)i.
However, it is straightforward to show that the symmetry
condition implies that the symmetries of the system are
functions only of the ratio of the growth rates, r2=r1. We thus
tested whether the algorithm correctly classifies systems
with the same growth rate ratio in the same dynamical kind
despite widely different values of r1 and r2. Conversely,
we examined the efficiency of the algorithm at
discriminating systems with distinct growth rate ratios despite similar
values for all parameters.
4.3
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>Classification</title>
        <p>To verify that the algorithm does in fact correctly
determine whether systems belong to the same dynamical kind
and to explore its limitations in doing so, we performed
experiments with both logistic growth and Lotka-Volterra
simulations. For growth models, we compared systems
governed by the Verhulst equation ( = = = 1) to systems
(3)
(4)
(5)
governed by different members of the family of logistic
growth models. Specifically, we examined the accuracy of
the algorithm for systems with = = 1 and different
values of . For each of 10 values of 6= 1, we ran 100
trials comparing a system with this value against one in
which = 1. In all cases, the systems belonged to different
dynamical kinds. Of course, the point of the algorithm is not
merely to recognize when dynamical systems are different,
but to group systems that, despite appearances, belong to the
same dynamical kind. So in a second set of experiments, we
examined comparisons between two systems governed by
the Verhulst equation ( = = = 1) that differed to
varying degrees in their respect growth rates, r. One system was
fixed at r = 1:5. We ran 100 trials with the second system
at each of 10 different values of r. According to Equation
4, all of these systems belong to the same dynamical kind.
In all of these experiments, we imposed measurement noise
that was normally distributed with a standard deviation of 5.
The results of these experiments are displayed in Figure 2.
We conducted an analogous battery of tests with simulated
Lotka-Volterra systems. These results are shown in Figure
3.</p>
        <p>The principal lesson to be drawn is that the algorithm works
– almost every time it declares two systems to belong to
different kinds, they in fact do. Of course, as the error
rate curves demonstrate, the more similar the trajectories
produced by the dynamics underlying two systems are –
such as when = 1 in one system and 1 in the other
– the more likely the algorithm is to miss the difference.
On the other hand, it’s very seldom fooled into declaring
distinct kinds when two systems – no matter how divergent
their apparent behavior – really do belong to the same kind.
4.4</p>
      </sec>
      <sec id="sec-4-5">
        <title>Noise and normality</title>
        <p>We characterized the robustness of the comparison
algorithm under increasing noise in the single-variable case by
testing it against pairs of samples from two kinds of Verhulst
logarithmic growth system: System 1 for which K = 100
and System 2 for which K = 90 (these are in different
dynamical kinds since the family of symmetries is a function
of K). For each of the four possible pairings of System 1
with System 2, we conducted 25 experiments (for a total
of 50 experiments in which the systems belonged to the
same dynamical kind, and 50 in which they belonged in
different kinds). In each experiment, Gaussian noise was
added to measurements of the state of each simulated system.
This was repeated for 10 different values of the standard
deviation, , of the distribution of added noise. A positive
example is one in which the two systems actually differ, a
negative is when systems of the same type are paired
together. Only 8 out of the 100 trials returned a false positive;
nearly all errors (by design) were false negatives. The
accuracy of the classifier is plotted in Figure 4 as a function
of the standard deviation of the noise. The algorithm is
fit for each sampled symmetry. To examine the effect of
the value of on the classifier error rate, we tested pairs
of samples from two kinds of Verhulst logarithmic growth
system: System 1 for which K = 100 and System 2 for
which K = 90 (these are in different dynamical kinds).
For each of the four possible pairings of System 1 with
System 2, we conducted 25 experiments (for a total of 50
experiments in which the systems belonged to the same
dynamical kind, and 50 in which they belonged in different
kinds). These experiments were repeated for each of 10
different values of , spanning 10 orders of magnitude. As is
clear from Figure 4.5 (e), there is no significant dependence
of the overall error rate on the value chosen. We conducted
analogous experiments in the multi-variable case by testing
pairs of samples from two kinds of Lotka-Volterra system,
specifically one for which r1 = 2; r2 = 4 and one for
which r1 = r2 = 3. Again, we ran 100 trials of which
50 involved systems of the same dynamical kind and 50
involved systems of different kinds. Again, there was no
significant dependence of the overall error rate upon the
value of , as indicated in Figure 4.5 (f).
&gt; 80% accurate until the standard deviation of the noise
reaches more than 12% of the overall range of the dynamical
variable measured.</p>
        <p>
          We conducted analogous experiments in the multi-variable
case by testing pairs of samples from two kinds of
LotkaVolterra system, specifically one for which r1 = 2; r2 = 4
and one for which r1 = r2 = 3. Again, we ran 100 trials
of which 50 involved systems of the same dynamical kind
and 50 involved systems of different kinds. We saw
comparable or better performance with respect to the preceding
experiment (see Figure 4 (c)). There is, however, a subtle
property of the algorithm’s behavior in this case that bears
mentioning. At very low values of noise, it performs no
better than at high values of noise. The reason for this is
that for two or more variables, there is no precise overlap
between the initial, untransformed trajectories of two
systems that aren’t exactly alike in parameter values. So when
the noise is low enough, the algorithm can always fit a joint
model essentially by tying together two separate models of
functions with disjoint support. However, a little bit of noise
blurs the trajectories enough to produce overlap, and the
statistics of the joint versus separate fits become dominated
by the shapes of the curves, rather than their exact location
in phase space. Fortunately, the conservative nature of the
algorithm allows this deficiency to be corrected by injecting
noise into a sample and watching to see if the algorithm
changes its mind from “same kind” to “different”.
Finally, to get a sense for the fragility of the algorithm
under violations of normality in the sample noise, we
repeated the same experiments, except that a Gaussian noise
source was replaced with the skew normal distribution
          <xref ref-type="bibr" rid="ref1">(Azzalini, 1985)</xref>
          . If (x) = 1=p2!2 e (x2!2)2 is the
standard normal probability density function with mean and
standard deviation !, (x) = R x (x0)dx0 is it’s
cor1
responding cumulative distribution function, and T (h; )
is Owen’s T -function
          <xref ref-type="bibr" rid="ref10">(Owen, 1956)</xref>
          , then the
cumulative distribution function of the skew normal is given by
(x) = ( (x! ) ) 2T ( (x! ) ; ). We used the method
of transformation to sample from this skew normal
distribution. When the shape parameter = 0, the skew normal
distribution reduces to an ordinary Gaussian with mean
and standard deviation, !. We tested the algorithm in
experiments in which = 0 and was varied from 0 to 30 while
keeping the standard deviation of the distribution fixed at 15
by varying ! appropriately. The noise was thus increasingly
right-skewed. Over this range, however, the skew had no
impact on the accuracy of the classifier, as can be seen in
Figure 4 (b) and (d).
4.5
        </p>
      </sec>
      <sec id="sec-4-6">
        <title>Dependence on tunable parameters</title>
        <p>As indicated in Section 3 we also examined the sensitivity
of the algorithm to the tunable parameter, , used in a
crossvalidation routine to choose the order of the polynomial</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>The experiments reported here demonstrate that it is
possible to recognize dynamical kinds without any knowledge
of the underlying dynamics, and to do so in the face of
significant noise. By directly assessing whether systems
share a common set of dynamical symmetries, it is possible
to pool systems for model identification or to validate
dynamical models without explicitly fitting models to the data.
Perhaps most importantly, we have demonstrated that an
important sort of scientific inference widely neglected in the
literature on automated discovery is amenable to algorithmic
treatment.</p>
      <p>But the work reported here only begins to exploit the
framework of dynamical kinds. The algorithm described above
was designed with a particular class of system in mind,
namely those for which the deterministic causal relations
are well-described by a differential equation in time. In other
words, it was assumed that systems of unknown dynamical
type are governed by a deterministic, time-dependent
dynamics, and that all random variation observed is noise in the
measurement process. But the notion of a dynamical kind
as reflected in Definition 2 does not limit one to considering
temporal dynamics. And with only modest modification,
one can accommodate the general case of stochastic
dynamics, that is, causal relations that are inherently noisy,
regardless of the measurement process. The requisite
modification is reflected by the following definition:
Definition 5 (Dynamical symmetry). Let V be a set of
variables. Let be an intervention on the variables in
Int V . The transformation is a dynamical symmetry
with respect to some index variable X 2 V Int if and
only if has the following property: for all xi and xf , the
final probability distribution over V is the same whether
is applied when E[X] = xi and then an intervention on X
makes it such that E[X] = xf , or the intervention on X is
applied first, changing its expected value from xi to xf , and
then is applied.</p>
      <p>
        To see how this extended definition can be applied in the
GCM framework, consider the class of two-variable models
in which x ! y, x is distributed according to px(x), and
the value of y is determined as a function f of the value of
x and an additive noise variable:
y := f (x; y0) +
(6)
The noise variable, is distributed according to p ( ) with
a mean value of 0. The function f may be any function
from values of x to values of y (not necessarily linear). The
parameter y0 is the expected value of y when the expected
value of x is x0. It’s a way of capturing the ‘memory’ and
dependence on initial conditions reflected in a differential
equation without unfolding such dependence as an infinite
chain of discrete time slices. Note that when f is a constant
function of y0, then Equation 6 is an instance of the sort
of (possibly nonlinear) model considered by
        <xref ref-type="bibr" rid="ref3">Hoyer et al.
(2009)</xref>
        .
      </p>
      <p>According to this model, we can express the joint
distribution over x and y as:
p(x; y) = px(x)p (yjx) = px(x)p (y
f (x; y0)) (7)
Let’s consider x as the index variable and y the target. Then
according to the extended definition, a transformation (y)
is a dynamical symmetry just if the joint distribution, p(x; y)
is the same whether we (i) first apply when the expected
value E[x] = x0 and then intervene to bring the expected
value of x to E[x] = x0 + (changing the probability
density over x to p~x(x)); or (ii) bring the expected value of
x to E[x] = x0 + and then apply to y. Given the model
expressed by Equation 6, this condition is satisfied only if:
If we assume that p~x(x) &gt; 0, this entails that
p~x(x)p (y</p>
      <p>f (x; (y0))) =
p~x(x)p (y</p>
      <p>(f (x; y0)))
f (x; (y0)) = (f (x; y0))
(8)
(9)
This is exactly the same condition that would have obtained
without the additive noise in the model. In general,
deterministic systems governed by first-order differential equations
can be cast in the form of Equation 6 without the additive
noise (and it’s not difficult to generalize that form for
higherorder dynamics). By adding the noise term, we can consider
stochastic versions of the dynamical systems considered
above. For example, we can introduce a stochastic logistic
growth model this way:
y :=
(K</p>
      <p>x0)y0x
(K y0)x0 + (y0 x0)x
The variable y represents the population at t + which is
dependent upon x, the population at t. The variable y0 is the
population at t + that results from a population of x0 at t.
Though it is perhaps not obvious from the form, note that
Equation 10 reduces to the deterministic model when the
noise term is set to 0. The symmetry condition expressed
by Equation 9 entails in this case that
+
(10)
(K</p>
      <p>x0)y0x
(K
y0)x0 + (y0
x0)x
=
which is in turn satisfied by
(K
p(y) =
(1
(K x0) (y0)x
(y0))x0 + ( (y0)</p>
      <p>x0)x</p>
      <p>Ky
e p)y + e pK
This is exactly the same set of symmetries that characterize
the deterministic dynamical kind (see Equation 4). In other
words, the extended definition of dynamical symmetry picks
out an expanded class of systems that includes as a proper
subset the systems governed by a deterministic Verhulst
equation with a particular value of the carrying capacity, K.
But it also includes systems with stochastic causal relations
as described by Equation 10.</p>
      <p>Despite this overlap in conditions for kind membership,
detecting sameness or difference of dynamical kind in
systems exhibiting stochastic causation requires an approach
different from the algorithm described above. That’s
because individual trajectories are random walks biased by the
function f rather than noisy instantiations of deterministic
trajectories. We leave the development of methods
appropriate for stochastic causation to future work. But we would
like to reiterate the generality of the theory that has allowed
us to frame the problem of kind discovery in this extended
context. We have already demonstrated an algorithm for the
causal systems of the sort that are the focus of most work
in system identification. The point we wish to stress is that
the theoretical framework of dynamical kinds extends to
the relations of stochastic causation at the focus of work in
causal discovery. Exploiting the framework in this context
in order to infer scientifically salient kinds, validate
models, and pool data for model construction will require only
modest algorithmic innovation.</p>
      <sec id="sec-5-1">
        <title>Acknowledgements</title>
        <p>This material is based upon work supported by the National
Science Foundation under Grant No. 1454190.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Azzalini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>1985</year>
          ).
          <article-title>A Class of Distributions Which Includes the Normal Ones</article-title>
          .
          <source>Scandinavian Journal of Statistics</source>
          <volume>12</volume>
          (
          <issue>2</issue>
          ),
          <fpage>171</fpage>
          -
          <lpage>178</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Golub</surname>
            ,
            <given-names>P. G. H.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Reinsch</surname>
          </string-name>
          (
          <year>1970</year>
          , April).
          <article-title>Singular value decomposition and least squares solutions</article-title>
          .
          <source>Numerische Mathematik</source>
          <volume>14</volume>
          (
          <issue>5</issue>
          ),
          <fpage>403</fpage>
          -
          <lpage>420</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Hoyer</surname>
            ,
            <given-names>P. O.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Janzing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Mooij</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peters</surname>
          </string-name>
          , and B.
          <string-name>
            <surname>Schlkopf</surname>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Nonlinear causal discovery with additive noise models</article-title>
          . In D. Koller,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , and L.
          <string-name>
            <surname>Bottou</surname>
          </string-name>
          (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          <volume>21</volume>
          , pp.
          <fpage>689</fpage>
          -
          <lpage>696</lpage>
          . Curran Associates, Inc.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Jantzen</surname>
            ,
            <given-names>B. C.</given-names>
          </string-name>
          (
          <year>2014</year>
          , December).
          <article-title>Projection, symmetry, and natural kinds</article-title>
          .
          <source>Synthese</source>
          <volume>192</volume>
          (
          <issue>11</issue>
          ),
          <fpage>3617</fpage>
          -
          <lpage>3646</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Ljung</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2010</year>
          , April).
          <source>Perspectives on system identification. Annual Reviews in Control 34(1)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Logemann</surname>
            , H. and
            <given-names>E. P.</given-names>
          </string-name>
          <string-name>
            <surname>Ryan</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Ordinary Differential Equations: Analysis, Qualitative Theory and Control</article-title>
          . London: Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Mattingly</surname>
            , J. and
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Warwick</surname>
          </string-name>
          (
          <year>2009</year>
          ,
          <article-title>August)</article-title>
          .
          <source>Projectible Predicates in Analogue and Simulated Systems. Synthese</source>
          <volume>169</volume>
          (
          <issue>3</issue>
          ),
          <fpage>465</fpage>
          -
          <lpage>482</lpage>
          . ArticleType: research-article / Full publication date: Aug.,
          <year>2009</year>
          / Copyright 2009 Springer.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Noether</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Tavel</surname>
          </string-name>
          (
          <year>1971</year>
          ).
          <article-title>Invariant Variation Problems</article-title>
          .
          <source>Transport Theory and Statistical Physics</source>
          <volume>1</volume>
          (
          <issue>3</issue>
          ),
          <fpage>186</fpage>
          -
          <lpage>207</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Norton</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          (
          <year>1986</year>
          ).
          <article-title>An introduction to identification</article-title>
          . London ; New York: Academic Press. OCLC:
          <volume>12237750</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Owen</surname>
            ,
            <given-names>D. B.</given-names>
          </string-name>
          (
          <year>1956</year>
          ).
          <article-title>Tables for Computing Bivariate Normal Probabilities</article-title>
          .
          <source>The Annals of Mathematical Statistics</source>
          <volume>27</volume>
          (
          <issue>4</issue>
          ),
          <fpage>1075</fpage>
          -
          <lpage>1090</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Pastor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Mathematical ecology of populations and ecosystems</article-title>
          . John Wiley &amp; Sons.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Pearl</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Causality : models, reasoning, and inference</article-title>
          . Cambridge, U.K.; New York: Cambridge University Press.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Rosen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>1995</year>
          ).
          <article-title>Symmetry in science : an introduction to the general theory</article-title>
          . New York: Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Rosen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Symmetry rules how science and nature are founded on symmetry</article-title>
          . Berlin: Springer.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Lipson</surname>
          </string-name>
          (
          <year>2009</year>
          , April).
          <source>Distilling Free-Form Natural Laws from Experimental Data. Science</source>
          <volume>324</volume>
          (
          <issue>5923</issue>
          ),
          <fpage>81</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Spirtes</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>C. N.</given-names>
            <surname>Glymour</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Scheines</surname>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Causation, prediction, and search</article-title>
          (2nd ed ed.).
          <article-title>Adaptive computation and machine learning</article-title>
          . Cambridge, Mass: MIT Press.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Spirtes</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          (
          <year>2016</year>
          ,
          <article-title>February)</article-title>
          .
          <article-title>Causal discovery and inference: concepts and recent methodological advances</article-title>
          .
          <source>Applied Informatics</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Tsoularis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>J</given-names>
            .
            <surname>Wallace</surname>
          </string-name>
          (
          <year>2002</year>
          ,
          <article-title>July)</article-title>
          .
          <article-title>Analysis of logistic growth models</article-title>
          .
          <source>Mathematical Biosciences</source>
          <volume>179</volume>
          (
          <issue>1</issue>
          ),
          <fpage>21</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Turchin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2001</year>
          ,
          <article-title>July)</article-title>
          .
          <source>Does population ecology have general laws? Oikos</source>
          <volume>94</volume>
          (
          <issue>1</issue>
          ),
          <fpage>17</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>