 # 3.7] How do I know if my data are deterministic? (nonlinear science)

## Description

This article is from the Nonlinear Science FAQ, by James D. Meiss jdm@boulder.colorado.edu with numerous contributions by others.

# 3.7] How do I know if my data are deterministic? (nonlinear science)

(Thanks to Justin Lipton for contributing to this answer)

How can I tell if my data is deterministic? This is a very tricky problem. It
is difficult because in practice no time series consists of pure 'signal.'
There will always be some form of corrupting noise, even if it is present as
round-off or truncation error or as a result of finite arithmetic or
quantization. Thus any real time series, even if mostly deterministic, will be
a stochastic processes

All methods for distinguishing deterministic and stochastic processes rely on
the fact that a deterministic system will always evolve in the same way from a
given starting point. Thus given a time series that we are testing for
determinism we
(1) pick a test state
(2) search the time series for a similar or 'nearby' state and
(3) compare their respective time evolution.

Define the error as the difference between the time evolution of the 'test'
state and the time evolution of the nearby state. A deterministic system will
have an error that either remains small (stable, regular solution) or increase
exponentially with time (chaotic solution). A stochastic system will have a
randomly distributed error.

Essentially all measures of determinism taken from time series rely upon
finding the closest states to a given 'test' state (i.e., correlation
dimension, Lyapunov exponents, etc.). To define the state of a system one
typically relies on phase space embedding methods, see [3.14].

Typically one chooses an embedding dimension, and investigates the propagation
of the error between two nearby states. If the error looks random, one
increases the dimension. If you can increase the dimension to obtain a
deterministic looking error, then you are done. Though it may sound simple it
is not really! One complication is that as the dimension increases the search
for a nearby state requires a lot more computation time and a lot of data (the
amount of data required increases exponentially with embedding dimension) to
find a suitably close candidate. If the embedding dimension (number of
measures per state) is chosen too small (less than the 'true' value)
deterministic data can appear to be random but in theory there is no problem
choosing the dimension too large--the method will work. Practically, anything
approaching about 10 dimensions is considered so large that a stochastic
description is probably more suitable and convenient anyway.

See e.g.,
Sugihara, G. and R. M. May (1990). "Nonlinear Forecasting as a Way of
Distinguishing Chaos from Measurement Error in Time Series." Nature
344: 734-740.

Continue to: