Safe Haskell | None |
---|---|

Language | Haskell2010 |

Online statistics for ordered data (such as time-series data), modelled as mealy machines

## Synopsis

- newtype Mealy a b = Mealy {}
- pattern M :: (c -> b) -> (c -> a -> c) -> (a -> c) -> Mealy a b
- scan :: Mealy a b -> [a] -> [b]
- fold :: Mealy a b -> [a] -> b
- newtype Averager a b = Averager {
- sumCount :: (a, b)

- pattern A :: a -> b -> Averager a b
- av :: Divisive a => Averager a a -> a
- av_ :: (Eq a, Additive a, Divisive a) => Averager a a -> a -> a
- online :: (Divisive b, Additive b) => (a -> b) -> (b -> b) -> Mealy a b
- ma :: (Divisive a, Additive a) => a -> Mealy a a
- absma :: (Divisive a, Additive a, Signed a) => a -> Mealy a a
- sqma :: (Divisive a, Additive a) => a -> Mealy a a
- std :: (Divisive a, ExpField a) => a -> Mealy a a
- cov :: Field a => Mealy a a -> Mealy (a, a) a
- corrGauss :: ExpField a => a -> Mealy (a, a) a
- corr :: ExpField a => Mealy a a -> Mealy a a -> Mealy (a, a) a
- beta1 :: ExpField a => Mealy a a -> Mealy (a, a) a
- alpha1 :: ExpField a => Mealy a a -> Mealy (a, a) a
- reg1 :: ExpField a => Mealy a a -> Mealy (a, a) (a, a)
- beta :: (Field a, Field a, KnownNat n) => a -> Mealy (Array '[n] a, a) (Array '[n] a)
- alpha :: (Field a, ExpField a, KnownNat n) => a -> Mealy (Array '[n] a, a) a
- reg :: (Field a, ExpField a, KnownNat n) => a -> Mealy (Array '[n] a, a) (Array '[n] a, a)
- asum :: Additive a => Mealy a a
- aconst :: b -> Mealy a b
- delay1 :: a -> Mealy a a
- delay :: [a] -> Mealy a a
- depState :: (a -> b -> a) -> Mealy a b -> Mealy a a
- data Model1 = Model1 {}
- zeroModel1 :: Model1
- depModel1 :: Double -> Model1 -> Mealy Double Double
- fromFoldl :: Fold a b -> Mealy a b
- foldB :: Reifies s W => (BVar s Double -> BVar s Double) -> BVar s Double -> BVar s [Double] -> BVar s Double
- maB :: Reifies s W => BVar s Double -> BVar s [Double] -> BVar s Double
- data Medianer a b = Medianer {}
- onlineL1 :: (Ord b, Field b, Signed b) => b -> b -> (a -> b) -> (b -> b) -> Fold a b
- onlineL1' :: (Ord b, Field b, Signed b) => b -> b -> (a -> b) -> (b -> b) -> Fold a (b, b)
- maL1 :: (Ord a, Field a, Signed a) => a -> a -> a -> Fold a a
- absmaL1 :: (Ord a, Field a, Signed a) => a -> a -> a -> Fold a a

# Types

A `Mealy`

is a triple of functions

- (c -> b)
**extract**Convert state to the output type. - (c -> a -> c)
**step**Update state given prior state and (new) input. - (a -> c)
**inject**Convert an input into the state type.

The type is a newtype wrapper around `L1`

in `Fold`

.

**inject** is necessary to kick off state on a `fold`

or `scan`

, rather than a state existing prior to the fold or scan (this is a Moore machine or `M1`

in `Fold`

).

scan (M e s i) (x : xs) = e <$> scanl' s (i x) xs

pattern M :: (c -> b) -> (c -> a -> c) -> (a -> c) -> Mealy a b Source #

Pattern for a `Mealy`

.

M extract step inject

scan :: Mealy a b -> [a] -> [b] Source #

Run a list through a `Mealy`

and return a list of values for every step

length (scan _ xs) == length xs

Most common statistics are averages, which are some sort of aggregation of values (sum) and some sort of sample size (count).

av :: Divisive a => Averager a a -> a Source #

extract the average from an `Averager`

av gives NaN on zero divide

av_ :: (Eq a, Additive a, Divisive a) => Averager a a -> a -> a Source #

substitute a default value on zero-divide

av_ (Averager (0,0)) x == x

online :: (Divisive b, Additive b) => (a -> b) -> (b -> b) -> Mealy a b Source #

`online f g`

is a `Mealy`

where f is a transformation of the data and g is a decay function (convergent tozero) applied at each step.

online id id == av

# Statistics

Generate some random variates for the examples.

xs0, xs1 & xs2 are samples from N(0,1)

xsp is a pair of N(0,1)s with a correlation of 0.8

`>>>`

`:set -XDataKinds`

`>>>`

`import Control.Category ((>>>))`

`>>>`

`import Data.List`

`>>>`

`import Data.Simulate`

`>>>`

`g <- create`

`>>>`

`xs0 <- rvs g 10000`

`>>>`

`xs1 <- rvs g 10000`

`>>>`

`xs2 <- rvs g 10000`

`>>>`

`xsp <- rvsp g 10000 0.8`

ma :: (Divisive a, Additive a) => a -> Mealy a a Source #

A moving average using a decay rate of r. r=1 represents the simple average, and r=0 represents the latest value.

`>>>`

100.0`fold (ma 0) (fromList [1..100])`

`>>>`

50.5`fold (ma 1) (fromList [1..100])`

`>>>`

-4.292501077490672e-2`fold (ma 0.99) xs0`

A change in the underlying mean at n=10000 in the chart below highlights the trade-off between stability of the statistic and response to non-stationarity.

absma :: (Divisive a, Additive a, Signed a) => a -> Mealy a a Source #

absolute average

`>>>`

0.7894201075535578`fold (absma 1) xs0`

sqma :: (Divisive a, Additive a) => a -> Mealy a a Source #

average square

fold (ma r) . fmap (**2) == fold (sqma r)

std :: (Divisive a, ExpField a) => a -> Mealy a a Source #

standard deviation

The construction of standard deviation, using the Applicative instance of a `Mealy`

:

(\s ss -> sqrt (ss - s ** (one+one))) <$> ma r <*> sqma r

The average deviation of the numbers 1..1000 is about 1 / sqrt 12 * 1000 https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)#Standard_uniform

`>>>`

288.9636655359978`fold (std 1) [0..1000]`

The average deviation with a decay of 0.99

`>>>`

99.28328803163829`fold (std 0.99) [0..1000]`

`>>>`

0.9923523681261158`fold (std 1) xs0`

cov :: Field a => Mealy a a -> Mealy (a, a) a Source #

The covariance of a tuple given an underlying central tendency fold.

`>>>`

0.8011368250045314`fold (cov (ma 1)) xsp`

corrGauss :: ExpField a => a -> Mealy (a, a) a Source #

correlation of a tuple, specialised to Guassian

`>>>`

0.8020637696465039`fold (corrGauss 1) xsp`

corr :: ExpField a => Mealy a a -> Mealy a a -> Mealy (a, a) a Source #

a generalised version of correlation of a tuple

`>>>`

0.8020637696465039`fold (corr (ma 1) (std 1)) xsp`

corr (ma r) (std r) == corrGauss r

beta1 :: ExpField a => Mealy a a -> Mealy (a, a) a Source #

The beta in a simple linear regression of an (independent variable, single dependent variable) tuple given an underlying central tendency fold.

This is a generalisation of the classical regression formula, where averages are replaced by `Mealy`

statistics.

\[ \begin{align} \beta & = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2} \\ & = \frac{n^2 \overline{xy} - n^2 \bar{x} \bar{y}}{n^2 \overline{x^2} - n^2 \bar{x}^2} \\ & = \frac{\overline{xy} - \bar{x} \bar{y}}{\overline{x^2} - \bar{x}^2} \\ \end{align} \]

`>>>`

0.9953875263096014`fold (beta1 (ma 1)) $ zipWith (\x y -> (y, x + y)) xs0 xs1`

alpha1 :: ExpField a => Mealy a a -> Mealy (a, a) a Source #

The alpha in a simple linear regression of an (independent variable, single dependent variable) tuple given an underlying central tendency fold.

\[ \begin{align} \alpha & = \frac{\sum y \sum x^2 - \sum x \sum xy}{n\sum x^2 - (\sum x)^2} \\ & = \frac{n^2 \bar{y} \overline{x^2} - n^2 \bar{x} \overline{xy}}{n^2 \overline{x^2} - n^2 \bar{x}^2} \\ & = \frac{\bar{y} \overline{x^2} - \bar{x} \overline{xy}}{\overline{x^2} - \bar{x}^2} \\ \end{align} \]

`>>>`

1.1880996822796197e-2`fold (alpha1 (ma 1)) $ zipWith (\x y -> ((3+y), x + 0.5 * (3 + y))) xs0 xs1`

reg1 :: ExpField a => Mealy a a -> Mealy (a, a) (a, a) Source #

The (alpha, beta) tuple in a simple linear regression of an (independent variable, single dependent variable) tuple given an underlying central tendency fold.

`>>>`

(1.1880996822796197e-2,0.49538752630956845)`fold (reg1 (ma 1)) $ zipWith (\x y -> ((3+y), x + 0.5 * (3 + y))) xs0 xs1`

beta :: (Field a, Field a, KnownNat n) => a -> Mealy (Array '[n] a, a) (Array '[n] a) Source #

multiple regression

\[ \begin{align} {\hat {{\mathbf {B}}}}=({\mathbf {X}}^{{{\rm {T}}}}{\mathbf {X}})^{{ -1}}{\mathbf {X}}^{{{\rm {T}}}}{\mathbf {Y}} \end{align} \]

\[ \begin{align} {\mathbf {X}}={\begin{bmatrix}{\mathbf {x}}_{1}^{{{\rm {T}}}}\\{\mathbf {x}}_{2}^{{{\rm {T}}}}\\\vdots \\{\mathbf {x}}_{n}^{{{\rm {T}}}}\end{bmatrix}}={\begin{bmatrix}x_{{1,1}}&\cdots &x_{{1,k}}\\x_{{2,1}}&\cdots &x_{{2,k}}\\\vdots &\ddots &\vdots \\x_{{n,1}}&\cdots &x_{{n,k}}\end{bmatrix}} \end{align} \]

let ys = zipWith3 (\x y z -> 0.1 * x + 0.5 * y + 1 * z) xs0 xs1 xs2 let zs = zip (zipWith (\x y -> fromList [x,y] :: F.Array '[2] Double) xs1 xs2) ys fold (beta 0.99) zs

- 0.4982692361226971, 1.038192474255091

alpha :: (Field a, ExpField a, KnownNat n) => a -> Mealy (Array '[n] a, a) a Source #

alpha in a multiple regression

reg :: (Field a, ExpField a, KnownNat n) => a -> Mealy (Array '[n] a, a) (Array '[n] a, a) Source #

multiple regression

let ys = zipWith3 (\x y z -> 0.1 * x + 0.5 * y + 1 * z) xs0 xs1 xs2 let zs = zip (zipWith (\x y -> fromList [x,y] :: F.Array '[2] Double) xs1 xs2) ys fold (reg 0.99) zs

([0.4982692361226971, 1.038192474255091],2.087160803386695e-3)

:: [a] | initial statistical values, delay equals length |

-> Mealy a a |

delays values by n steps

delay [0] == delay1 0

delay [] == id

delay [1,2] = delay1 2 . delay1 1

`>>>`

[-2,-1,0,1]`scan (delay [-2,-1]) [0..3]`

Autocorrelation example:

scan (((,) <$> id <*> delay [0]) >>> beta (ma 0.99)) xs0

depState :: (a -> b -> a) -> Mealy a b -> Mealy a a Source #

Add a state dependency to a series.

Typical regression analytics tend to assume that moments of a distributional assumption are unconditional with respect to prior instantiations of the stochastics being studied.

For time series analytics, a major preoccupation is estimation of the current moments given what has happened in the past.

IID:

\[ \begin{align} x_{t+1} & = alpha_t^x + s_{t+1}\\ s_{t+1} & = alpha_t^s * N(0,1) \end{align} \]

Example: including a linear dependency on moving average history:

\[ \begin{align} x_{t+1} & = (alpha_t^x + beta_t^{x->x} * ma_t^x) + s_{t+1}\\ s_{t+1} & = alpha_t^s * N(0,1) \end{align} \]

`>>>`

`let xs' = scan (depState (\a m -> a + 0.1 * m) (ma 0.99)) xs0`

`>>>`

`let ma' = scan ((ma (1 - 0.01)) >>> delay [0]) xs'`

`>>>`

`let xsb = fold (beta1 (ma (1 - 0.001))) $ drop 1 $ zip ma' xs'`

`>>>`

`-- beta measurement if beta of ma was, in reality, zero.`

`>>>`

`let xsb0 = fold (beta1 (ma (1 - 0.001))) $ drop 1 $ zip ma' xs0`

`>>>`

9.999999999999976e-2`xsb - xsb0`

This simple model of relationship between a series and it's historical average shows how fragile the evidence can be.

In unravelling the drivers of this result, the standard deviation of a moving average scan seems well behaved for r > 0.01, but increases substantively for values less than this. This result seems to occur for wide beta values. For high r, the standard deviation of the moving average seems to be proprtional to r**0.5, and equal to around (0.5*r)**0.5.

fold (std 1) (scan (ma (1 - 0.01)) xs0)

a linear model of state dependencies for the first two moments

\[ \begin{align} x_{t+1} & = (alpha_t^x + beta_t^{x->x} * ma_t^x + beta_t^{s->x} * std_t^x) + s_{t+1}\\ s_{t+1} & = (alpha_t^s + beta_t^{x->s} * ma_t^x + beta_t^{s->s} * std_t^x) * N(0,1) \end{align} \]

## Instances

Eq Model1 Source # | |

Show Model1 Source # | |

Generic Model1 Source # | |

type Rep Model1 Source # | |

Defined in Data.Mealy type Rep Model1 = D1 (MetaData "Model1" "Data.Mealy" "online-0.6.0-HLmMvMxp7IYLqRFxok4q8s" False) (C1 (MetaCons "Model1" PrefixI True) ((S1 (MetaSel (Just "alphaX") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Double) :*: (S1 (MetaSel (Just "alphaS") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Double) :*: S1 (MetaSel (Just "betaMa2X") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Double))) :*: (S1 (MetaSel (Just "betaMa2S") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Double) :*: (S1 (MetaSel (Just "betaStd2X") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Double) :*: S1 (MetaSel (Just "betaStd2S") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Double))))) |

zeroModel1 :: Model1 Source #

depModel1 :: Double -> Model1 -> Mealy Double Double Source #

Apply a model1 relationship using a single decay factor.

`>>>`

`:set -XOverloadedLabels`

`>>>`

-0.47228537123218206`fold (depModel1 0.01 (zeroModel1 & #betaMa2X .~ 0.1)) xs0`

# conversion

foldB :: Reifies s W => (BVar s Double -> BVar s Double) -> BVar s Double -> BVar s [Double] -> BVar s Double Source #

# median

A rough Median. The average absolute value of the stat is used to callibrate estimate drift towards the median

onlineL1 :: (Ord b, Field b, Signed b) => b -> b -> (a -> b) -> (b -> b) -> Fold a b Source #

onlineL1 takes a function and turns it into a `Fold`

where the step is an incremental update of an (isomorphic) median statistic.

onlineL1' :: (Ord b, Field b, Signed b) => b -> b -> (a -> b) -> (b -> b) -> Fold a (b, b) Source #

onlineL1' takes a function and turns it into a `Fold`

where the step is an incremental update of an (isomorphic) median statistic.