Framework

Let the observed data y_1 : T = (y₁, …, y_T) which are indexed by time come from a parametric model with time parameter vectors θ_t whose density is given by P(y_t|θ_t). The parameters in θ_t may be time varying or invariant. Anomalies are modelled as parametric epidemic changepoints, represented by changes in the parameters θ_t which are common across all timesteps in the anomaly.

The ith anomalous period consists to n_i consecuative time steps which are denoted by the set T^[i]. The K anomalous periods are disjoint so $\bigcap\limits_{i=1}^{K} T^{\left[i\right]} = \emptyset$ and ordered such that $\max\limits_{t \in T^{\left[i\right]}} t < \min\limits_{t \in T^{\left[j\right]}} t$ for all i < j. the variations in the parameters caused by the anomalous periods is given by $$ \theta_{t} = \left\{ \begin{array}{ll} \theta_{t}^{\left[1\right]} & t \in T^{\left[1\right]} \\ & \vdots \\ \theta_{t}^{\left[K\right]} & t \in T^{\left[K\right]} \\ \theta_{t}^{\left[0\right]} & \mathrm{otherwise} \end{array} \right. $$ The density and values of θ_t^[0] determine the non anomalous behaviour of the process generating the observed data. If these are considered known a priori then the anomalous periods can be determined by the selection of K, T^[1], …, T^[K] to minimise the penalised cost

$$ \sum\limits_{t\notin\cup T^{\left[i\right]}} \mathcal{C}\left(y_{t},\theta_{t}^{\left[0\right]}\right) + \sum\limits_{i=1,\ldots,K}\left\{ \min_{\theta_{t}^{\left[i\right]}}\left( \sum\limits_{t \in T^{\left[i\right]}} \mathcal{C}\left(y_{t},\theta_{t}^{\left[i\right]}\right) \right) + \beta \right\} $$

subject to n_i > l. The minimum anomaly length l is related to the anoamly cost function 𝒞(y_t, θ_t) and ensures that the minimum with respect to θ_t^[i] can be found. Concrete examples of this framework cost functions can be found in the cost function vignettes.

One possible definition of 𝒞(y_t, θ_t) is as the negative log-likelihood of data given by the parametric model. In such cases a common choices for the penalty β are based on informationc criteria commonly used for model selection . As noted in <> in practical settings may of these criteria perform poorly. Instead, in the follwoing section the CROPS algorithm, whch offers a graphical selection method for the selecton of the penalty term in changepoint analysis is adapted for use in this anomaly framework.

CROPS

Folowing we relate the minimum value of the penalised cost function above which is given by $$ Q\left(\mathbf{y}_{1:T},\beta\right) = \min\limits_{K, T^{\left[1\right]},\ldots,T^{\left[K\right]}} \left(\sum\limits_{t\notin\cup T^{\left[i\right]}} \mathcal{C}\left(y_{t},\theta_{t}^{\left[0\right]}\right) + \sum\limits_{i=1,\ldots,K}\left\{ \min_{\theta_{t}^{\left[i\right]}}\left( \sum\limits_{t \in T^{\left[i\right]}} \mathcal{C}\left(y_{t},\theta_{t}^{\left[i\right]}\right) \right) + \beta \right\} \right) $$

to the minimum cost of a partition with K anomalies given by

$$ Q_{K}\left(\mathbf{y}_{1:T}\right) = \min\limits_{T^{\left[1\right]},\ldots,T^{\left[K\right]}} \left(\sum\limits_{t\notin\cup T^{\left[i\right]}} \mathcal{C}\left(y_{t},\theta_{t}^{\left[0\right]}\right) + \sum\limits_{i=1,\ldots,K}\left\{ \min_{\theta_{t}^{\left[i\right]}}\left( \sum\limits_{t \in T^{\left[i\right]}} \mathcal{C}\left(y_{t},\theta_{t}^{\left[i\right]}\right) \right) \right\} \right) $$

through $$ Q\left(\mathbf{y}_{1:T},\beta\right) = \min\limits_{K} \left( Q_{K}\left(\mathbf{y}_{1:T}\right) + K\beta \right) $$

This is exactly the form of the CROPS paper so theorom 3.1 and algorithm still apply

Point anomalies

Select penalty based on number of standard deviations away from the mean then run CROPS for collective anomaly. TODO - document this is correct

- Framework
  - CROPS
- Point anomalies

CROPS with anomalies

Framework

CROPS

Point anomalies